engineering

Postgres Full-Text Search: tsvector, GIN Indexes, and When to Stop Before Elasticsearch

Anethoth

14 May 2026 — 5 min read

Search shows up early in the life of most SaaS products. A customer wants to find an invoice by the customer name on it. A team wants to filter feature flags by partial substring of the description. A developer wants to grep through captured webhook payloads. The reflexive answer is to add a search service: Elasticsearch, Meilisearch, Typesense, or one of the hosted alternatives. The reflex is sometimes right, but it often skips a question that should be asked first: can Postgres do this?

Postgres has full-text search built in. The implementation has been mature since version 8.3 in 2008. It handles up to several million documents on a single instance, supports the standard features of stemming, stop words, weights, ranking, and phrase search, and integrates with the rest of the schema without requiring a separate system to operate. We use Postgres full-text search across DocuMint, CronPing, FlagBit, and WebhookVault, and the cases where it is the right answer are broader than most teams assume.

The basic mechanism: tsvector and tsquery

Postgres represents searchable text as a tsvector, which is a sorted list of distinct lexemes (normalized word stems) along with their positions in the original text. The query is represented as a tsquery, which is a boolean expression of lexemes connected by AND, OR, NOT, and phrase operators. A match is checked with the @@ operator, which returns true if the query matches the vector.

The basic schema looks like this:

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  search_vector TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector('english', title), 'A') ||
    setweight(to_tsvector('english', body), 'B')
  ) STORED
);

CREATE INDEX documents_search_idx ON documents USING GIN (search_vector);

The search_vector column is a generated column (which we covered in detail in our piece on Postgres generated columns) that combines the title and body, with the title weighted higher than the body for ranking purposes. The GIN index makes searches fast even on large tables. Queries look like this:

SELECT id, title, ts_rank(search_vector, query) AS rank
FROM documents, plainto_tsquery('english', $1) query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

The plainto_tsquery function parses a user-supplied search string into a query, treating spaces as AND operators. There is also websearch_to_tsquery, which handles quoted phrases, OR operators, and negation in a syntax that matches what users expect from a web search box. The ts_rank function computes a relevance score based on lexeme frequency and weight.

What the features cover

The set of features that ships with Postgres full-text search is broader than most teams realize. Stemming reduces "running", "ran", and "runs" to a common root for matching. Stop words remove common words like "the" and "of" from the index to save space and improve relevance. Weights allow different parts of a document to contribute differently to ranking: a match in the title counts more than a match in the body. Phrase search via <-> finds adjacent lexemes. The headline function ts_headline generates snippet text with matched terms highlighted.

Language support is built in for 22 languages including English, French, German, Russian, Spanish, and Mandarin. The configuration affects stemming and stop-word lists; choosing the wrong language degrades relevance but does not break the search. For multilingual content, the configuration can be stored per-row and applied dynamically.

Phonetic search, fuzzy matching, and autocomplete are not part of the core full-text search system but are available through extensions. The pg_trgm extension provides trigram-based similarity matching for typo-tolerant search and fuzzy autocomplete. The fuzzystrmatch extension provides Levenshtein distance and Soundex matching. Combining these with full-text search gives you most of the search features that justify reaching for a separate search service.

The performance envelope

The honest performance answer is that Postgres full-text search handles up to roughly 10 million documents on a properly-sized instance, with query latencies in the low milliseconds for typical queries. The limits depend on document size, query complexity, hardware, and how aggressively you use GIN-specific tuning parameters like gin_pending_list_limit.

The GIN index is what makes this work. Without it, full-text search degrades to sequential scan, which is fine for hundreds of documents and useless for millions. The GIN index size is typically 30-50% of the indexed text size, which is similar to Elasticsearch's index sizing but without the operational overhead of running a separate system. GIN indexes have a known weakness: index updates are expensive, which manifests as slow writes on tables with frequent edits. The gin_pending_list_limit parameter trades query-time work for write-time work and can be tuned per table.

For tables with hundreds of thousands of rows or fewer, the performance is unmeasurable: queries return in single-digit milliseconds regardless of index strategy. For tables with millions of rows, the GIN index is essential and queries return in tens of milliseconds. For tables with tens of millions of rows, performance depends heavily on tuning: the right configuration can keep queries under 100ms; the wrong configuration can produce query times of seconds.

Where Postgres full-text search is not enough

The honest cases for a dedicated search service start at roughly 50 million documents, or when query latency requirements are below 10ms at high concurrency, or when the search experience requires features that Postgres does not provide cleanly. The features that genuinely benefit from a dedicated service are:

Multi-field relevance tuning with per-field weights and boosts beyond what tsvector weights provide. Elasticsearch's BM25 ranking with per-field analyzers gives more control over relevance than Postgres's ts_rank, which is essentially TF-IDF with weights. For most use cases this does not matter; for cases where relevance tuning is the product, it does.

Faceted search and aggregations as part of the search response. Elasticsearch's aggregations framework returns counts and breakdowns alongside results, which Postgres can produce but with separate queries that add latency. For a search UI with sidebar facets, the round-trip cost of separate queries can dominate.

Geographic and spatial search at scale. Postgres has PostGIS for spatial queries, but combining spatial filtering with full-text search and relevance ranking is awkward. Elasticsearch and Solr handle this combination natively.

Search-as-you-type with sub-50ms latency at high concurrency. Postgres can do this for small datasets, but a dedicated search service with in-memory caching and optimized analyzers handles it more cleanly.

Cross-language search with per-document language detection. Postgres handles multiple languages but requires explicit language selection per query. Elasticsearch can detect language automatically and apply the right analyzer.

The migration path

The right strategy for most teams is to start with Postgres full-text search and graduate to a dedicated service when the requirements demand it. The migration is not free: it requires a separate system to operate, a separate index to keep in sync, and a separate query language to maintain. But it is also not catastrophic: the data layer in Postgres remains canonical, and the search service is a read-only view that can be rebuilt from the data.

The signs that it is time to migrate include: queries that consistently take more than 200ms despite tuning, a search UI that needs features Postgres cannot provide cleanly, a data volume that exceeds what GIN indexes can handle in available RAM, or a team large enough that having a dedicated search engineer makes sense. The signs that it is not time to migrate include: the search feature is used by a small fraction of users, the data volume is under a million documents, the query latency is acceptable on Postgres, or the team is small enough that operating two systems is not worth the gain.

We have not migrated any of our four products to a dedicated search service. The combined searchable content across DocuMint, CronPing, FlagBit, and WebhookVault is well under a million documents, the query patterns are simple, and Postgres full-text search handles every search request with sub-50ms latency. If we ever cross the threshold where a dedicated service becomes the right answer, the migration will be a project, but until then the right answer is the one we already have.

What this is and is not

The honest summary: Postgres full-text search is an underused feature that eliminates an entire system from most SaaS architectures if the data volume is below ten million documents and the search requirements are conventional. It is not a replacement for Elasticsearch in cases where the search experience is the product, but those cases are rarer than most teams assume. The reflexive answer of "add a search service" should always be preceded by the question "what does Postgres give us out of the box?" and the answer to that question is more than most teams expect.

Postgres Full-Text Search: tsvector, GIN Indexes, and When to Stop Before Elasticsearch

Anethoth

The basic mechanism: tsvector and tsquery

What the features cover

The performance envelope

Where Postgres full-text search is not enough

The migration path

What this is and is not

Read more

The Forgotten History of the Bicycle Wheel: How Wire Spokes Made the Modern Wheel Possible

How Wandering Albatrosses Sleep While Flying: The Strange Neural Engineering of Unihemispheric Slow-Wave Sleep

Designing API Webhook Deactivation: When and How to Stop Calling Endpoints That Persistently Fail

Postgres pg_stat_statements_info: Tracking the Statistics Collector's Own Health