Postgres Trigram Indexes: Fast Fuzzy Search Without a Search Service

The default behavior of a B-tree index on a text column is to support prefix matches but nothing else. A query like WHERE name LIKE 'acme%' can use the index because the planner can rewrite the predicate as a range scan. A query like WHERE name LIKE '%acme%' cannot, because there is no prefix to anchor the range. The same applies to ILIKE, to misspelling-tolerant comparisons, and to most of the queries that actual customer-facing search interfaces produce. The reflexive answer is to install a separate search service. Before reaching for one, it is worth asking what Postgres can do natively.

What pg_trgm does

The pg_trgm extension, shipped with the Postgres contrib package since 8.3, decomposes strings into trigrams (three-character sequences) and uses them as the basis for similarity comparisons and accelerated LIKE/ILIKE queries. The string "acme" decomposes into the trigrams " a", " ac", "ace", "cme", "me ", "e " (with padding to handle the start and end). Two strings are more similar when they share more trigrams. The similarity() function returns a value between 0 and 1 reflecting the fraction of shared trigrams.

The basic operators are % (similar above a configurable threshold), <-> (similarity distance for ordering), and the LIKE/ILIKE pattern operators. The threshold defaults to 0.3 and is adjustable via pg_trgm.similarity_threshold. Lower values match more aggressively; higher values require closer matches.

GIN vs GiST trade-offs

The extension supports both GIN and GiST index types over trigram columns, with different trade-offs. GIN indexes are faster for read queries because the inverted index structure maps trigrams directly to row pointers. They are slower to maintain on writes because every distinct trigram update touches multiple posting lists. GiST indexes are smaller and faster to update but slower to query because they require lossy filtering against the heap.

For most workloads, GIN is the right default. The query speedup is typically 10-100x over sequential scan on tables with tens to hundreds of thousands of rows. The write cost is proportional to the average number of trigrams per row, which is typically 30-100 for human-readable names. The disk overhead is 2-5x the column size, larger than a B-tree but acceptable for the search functionality it enables.

GIN has a fastupdate mechanism that batches writes through a pending list before merging into the main index. The pending list is consulted on reads, which keeps writes fast at the cost of slightly slower reads when the pending list is large. Tuning fastupdate and gin_pending_list_limit is a per-workload decision; for write-heavy workloads it is on, for read-heavy workloads with infrequent updates it can be off.

The query patterns that benefit

The most obvious case is unanchored LIKE/ILIKE. A trigram index supports both, because the planner can decompose the search pattern into trigrams and look up rows containing all of them. The pattern WHERE name ILIKE '%acme%' becomes an index probe for the trigrams of "acme" followed by a heap recheck to confirm the exact pattern match.

Similarity search is the second case. WHERE name % 'acme' returns rows above the configured similarity threshold, useful for misspelling-tolerant matching. ORDER BY name <-> 'acme' LIMIT 10 returns the ten closest matches, useful for autocomplete and "did you mean" suggestions. Both patterns are accelerated by the trigram index.

Regular expressions are partially supported through the regexp_matches and posix-regex operators (~, ~*). The acceleration is conservative; complex patterns may fall back to sequential scan. Simple patterns with literal anchors get full index acceleration.

The query patterns that do not benefit

Trigram indexes do not help with short patterns. A pattern of one or two characters has no trigrams, so the index cannot accelerate the query. The planner falls back to sequential scan. The minimum useful pattern length is three characters, and longer patterns generally produce better selectivity.

Exact-match equality on the same column does not need a trigram index. A B-tree index is faster and smaller for equality lookups. If your query mix includes both exact equality and fuzzy search, you may want both indexes on the same column, which doubles the write cost.

Phonetic similarity (Soundex, Metaphone) is not what trigrams measure. Trigram similarity is character-overlap similarity, which catches misspellings well but does not capture sound-alike matches. The fuzzystrmatch extension provides Soundex and similar algorithms; combining fuzzystrmatch with pg_trgm gives broader coverage at the cost of more complex queries.

The autocomplete pattern

The canonical use case for pg_trgm is server-side autocomplete: the user types a prefix or partial string, and the API returns matching results ordered by similarity. The query shape is:

SELECT id, name
FROM organizations
WHERE name % $1
ORDER BY name <-> $1
LIMIT 10;

With a GIN trigram index on organizations(name), this query returns in single-digit milliseconds on tables with millions of rows. The latency is dominated by the index probe and the limit-driven sort, both of which scale well. The user experience is essentially indistinguishable from a dedicated search service.

The threshold matters. Too low (0.1) and the result list is noisy; too high (0.5) and minor misspellings drop out. The right value depends on the data: short organization names need a different threshold than long product descriptions. Tuning is per-column and is worth doing carefully.

When to graduate to a search service

pg_trgm scales well to tens of millions of rows on commodity hardware. Beyond that, the index size and write cost start to dominate. Specific failure modes include: very long text columns (full documents, multi-paragraph descriptions) where the trigram count per row gets large; multi-language workloads where the same trigram has different semantic content in different languages; complex relevance scoring that requires BM25 or TF-IDF rather than simple similarity.

The graduation path is to set up Elasticsearch or OpenSearch or Meilisearch alongside Postgres, with the search service kept current through change data capture or scheduled syncs. The transition is operationally significant: another moving piece, another infrastructure dependency, another failure mode. It should not be the default until pg_trgm has demonstrably stopped scaling.

What pg_trgm does not solve

The extension does not provide relevance scoring beyond simple similarity. There is no concept of inverse-document-frequency weighting, no field boosting, no synonym dictionaries, no language-specific stemming. The Postgres full-text search machinery (to_tsvector, GIN over tsvector) provides those features and is a complement rather than a replacement for pg_trgm. Many production systems use both: pg_trgm for autocomplete and fuzzy matching, tsvector for full-text search with stemming and relevance.

The extension does not provide cross-field search. A query like "find organizations whose name OR address OR description matches the search string" needs either three separate trigram indexes (one per field) combined via UNION, or a concatenated-field expression index, or full-text search with weighted vectors. pg_trgm alone handles single-column matching cleanly and multi-column matching with more work.

Our SQLite-based products

DocuMint, CronPing, FlagBit, and WebhookVault all run on SQLite, which has FTS5 for full-text search but no equivalent to pg_trgm trigram similarity. The patterns that pg_trgm handles cleanly (autocomplete, misspelling-tolerant matching, similarity ordering) we implement at the application layer using fuzzy-string matching libraries with manual trigram-or-Levenshtein scoring. The performance is acceptable for our table sizes but does not scale the way an indexed solution would.

The Postgres migration plan includes pg_trgm as one of the baseline extensions to install at cluster creation. The use cases are CronPing monitor name search (customers with hundreds of monitors), FlagBit flag name autocomplete in the dashboard, WebhookVault endpoint URL search, and DocuMint customer name matching. Each is a small feature individually, but the consistent pattern of "search a list of customer-owned named entities" recurs across products, and pg_trgm handles the pattern with a single extension and an index per column.

The deeper observation

The reflexive answer to "we need search" is often "install a search service," but the reflexive answer is right less often than industry convention suggests. The cases where Postgres native search is sufficient have moved upward over the past decade as the extension ecosystem has matured: pg_trgm for fuzzy matching, tsvector for full-text search, pgvector for semantic similarity. The combination handles a substantial fraction of B2B SaaS search needs without the operational cost of a separate search service. The pattern recurs across other database features: many problems that get separate infrastructure could be solved with extensions that already ship with Postgres, and the gap between what teams use and what is available is often larger than the gap between what is available and what is needed.

Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.