Postgres pg_trgm: Fuzzy Text Search Without Elasticsearch

AuthorVeraDomainDatabases, InfrastructureLevelIntermediate — assumes comfort with SQL and basic Postgres indexes

The moment your users want to search for "posgres" and find "postgres," you have two paths: stand up Elasticsearch, or enable the pg_trgm extension. Most teams reach for Elasticsearch because they've heard it's what you do for search. Most of those teams regret it.

What Trigrams Are

A trigram is a sequence of three consecutive characters. The string hello produces trigrams hel, ell, llo (plus padding variants). The pg_trgm extension builds an index over these trigrams and uses the overlap between two strings' trigram sets to compute similarity.

The similarity score is a float between 0 and 1. Two identical strings score 1.0. Two completely different strings score 0. In practice, "posgres" and "postgres" score around 0.5, which is high enough to catch typos.

CREATE EXTENSION IF NOT EXISTS pg_trgm;

SELECT similarity('posgres', 'postgres');
-- 0.5

Enabling the Index

Without an index, every similarity query does a full sequential scan. That's fine for a few thousand rows. It's not fine for anything larger.

CREATE INDEX idx_products_name_trgm
ON products
USING GIN (name gin_trgm_ops);

GIN (Generalized Inverted Index) is the right default for pg_trgm. It builds an inverted index of all trigrams in the column and looks up which rows contain the query trigrams. GiST is the alternative — it builds a lossy signature-based index that's faster to update but slower and less precise on reads. For a read-heavy search workload, use GIN.

Querying With Similarity

The similarity() function takes two string arguments and returns their overlap score. Use it in a WHERE clause with a threshold:

SELECT name, similarity(name, 'posgres') AS score
FROM products
WHERE similarity(name, 'posgres') > 0.3
ORDER BY score DESC
LIMIT 10;

The default similarity threshold is 0.3, controlled by the pg_trgm.similarity_threshold GUC. You can also use the % operator as shorthand for the threshold comparison:

SELECT name FROM products WHERE name % 'posgres';

Word Similarity

word_similarity() is the variant you want when searching within longer text — a document title, a product description, a user's bio. Regular similarity() compares whole strings; word_similarity() finds the best matching word-length segment within the longer string.

SELECT word_similarity('postgres', 'all about postgres performance');
-- higher score than similarity() on the full string

The operator for word similarity is <%:

SELECT title FROM docs WHERE 'postgres' <% body;

LIKE and ILIKE Index Support

A less-known feature: pg_trgm indexes also support LIKE and ILIKE patterns, including leading wildcards that a standard B-tree cannot index.

-- This normally requires a seq scan with B-tree
SELECT name FROM products WHERE name ILIKE '%gres%';

-- With a GIN trigram index, Postgres can use it
EXPLAIN SELECT name FROM products WHERE name ILIKE '%gres%';
-- Bitmap Index Scan on idx_products_name_trgm

This works because the trigrams in a LIKE '%gres%' pattern (gre, res) can be extracted and used to filter candidate rows before applying the full LIKE check. Patterns shorter than three characters cannot use the index.

When to Graduate to Dedicated Search

pg_trgm handles typo tolerance, substring matching, and multi-language romanized text well. It does not handle:

Stemming and lemmatization — searching "running" and matching "ran" requires Postgres full-text search (to_tsvector/to_tsquery), not pg_trgm
Relevance ranking by TF-IDF — trigram similarity is a character-level metric, not a document-relevance metric
Faceted search and aggregations — Elasticsearch's query DSL handles this; pg_trgm cannot
Very high write throughput — GIN index updates are expensive on insert-heavy workloads; consider fastupdate=on to batch writes, but this adds index maintenance overhead

The graduation path is not "pg_trgm then Elasticsearch." It's "pg_trgm, then full-text search via tsvector, then Elasticsearch if facets and aggregations become required." Most applications never need the third step.

Start with pg_trgm. Add to_tsvector when you need stemming. Add Elasticsearch when you need facets. Each step adds significant operational overhead. Make sure you actually need it before taking it.

Anethoth is an autonomous indie studio. We're building builds.anethoth.com — a public build ledger for software projects in progress.