Postgres GIN vs GiST: Choosing the Right Index Type for Containment Queries
GIN and GiST are the two Postgres index access methods that handle the queries B-tree cannot: containment, overlap, full-text search, JSON path. They have different cost profiles. Picking wrong produces orders-of-magnitude difference in workload behavior.
Most application developers know one Postgres index type: B-tree, the default that handles equality and range queries on scalar columns. The other index types (GIN, GiST, SP-GiST, BRIN, hash) are reached for less often and most teams pick between them by reading the first paragraph of the documentation and choosing whichever one the example uses. This produces a pattern of GIN-by-default for jsonb and full-text search and GiST-by-default for geometry, with both choices being roughly right most of the time and substantially wrong some of the time.
What each one stores
GIN stands for Generalized Inverted Index. Inverted because it maps from values to row locations: a posting list per distinct value, with each list containing the heap tuples that contain the value. The query path is: look up the value in the index, get the posting list, fetch the heap tuples. This is the same architecture as Lucene and other full-text-search engines.
GiST stands for Generalized Search Tree. It is a balanced tree where each node holds a predicate that covers all descendants. The query path traverses the tree, descending into subtrees whose predicate matches the query and pruning subtrees that do not. The architecture generalizes B-tree to non-totally-ordered data.
The structural difference matters for write cost and query cost. GIN concentrates work at insert time: every distinct value in a document requires a posting-list update. GiST concentrates work at query time: tree traversal visits multiple subtrees when predicates overlap. The right choice depends on whether your workload is write-heavy or read-heavy and how many distinct values each row contains.
The classic read-heavy vs write-heavy split
The conventional wisdom is GIN for read-heavy workloads, GiST for write-heavy workloads. The reasoning is straightforward: GIN gives faster queries because containment lookups are direct posting-list reads, but GIN is slow to update because every value in the new row requires a posting-list edit. GiST is slower for containment queries because the tree may have overlapping predicates that require visiting multiple subtrees, but updates are cheap because a single tree-node update covers the row.
The Postgres GIN fastupdate mechanism mitigates the write cost. With fastupdate enabled (the default), inserts go to a pending list rather than directly to the posting lists. The pending list is merged into the main index in batches by autovacuum or when it exceeds gin_pending_list_limit. This trades query cost during the pending interval for insert throughput. For workloads where queries can tolerate slightly stale containment results, fastupdate makes GIN viable for higher write rates.
The right rule of thumb is: GIN for documents with many distinct values per row (full-text, large jsonb, large array columns), GiST for geometric and range queries where the predicates form a natural tree. The crossover depends on the specific workload but the qualitative shape holds.
Where GIN is the only practical choice
Full-text search via tsvector is GIN territory. The GiST implementation of tsvector exists and works, but the index size and query performance are substantially worse than GIN for any non-trivial corpus. Production full-text search on Postgres uses GIN nearly always.
JSON path queries via jsonb_ops are GIN territory for the same reason. The number of distinct keys and values across a jsonb corpus is high; the posting-list architecture handles this scale better than tree traversal. The jsonb_path_ops variant trades smaller index size for support of fewer operators, but is still GIN.
Large array columns with membership queries are GIN territory. The classic case is a tags column where rows have variable numbers of tags and the query is "find rows containing all of these tags." GIN gives this in posting-list intersection time. GiST would require tree traversal across overlapping subtrees.
Where GiST is the only practical choice
Geometric queries are GiST territory. PostGIS spatial indexes are GiST. The geometric predicates (bounding-box overlap, containment, distance) form a natural tree where parent predicates contain child predicates. The GIN architecture does not have a natural way to express geometric range.
Range type queries are GiST territory. The range_ops operator class supports overlap and containment queries on date ranges, integer ranges, and timestamp ranges. EXCLUDE constraints with overlap predicates depend on GiST range indexes. The GIN architecture does not handle ranges natively.
Trigram similarity via pg_trgm gist_trgm_ops is GiST territory in some workloads, with the GIN gin_trgm_ops alternative being faster for query but slower for write. The choice depends on workload mix.
The both-can-work cases
Trigram search via pg_trgm supports both GIN (gin_trgm_ops) and GiST (gist_trgm_ops) operator classes. The choice depends on workload mix: GIN is faster for query, GiST is faster for write, both work correctly. Production teams typically pick GIN unless the write rate is so high that the posting-list updates dominate.
JSON containment via jsonb supports both jsonb_ops (GIN) and jsonb_path_ops (GIN, smaller). GiST jsonb support exists but is rarely used because the GIN variants outperform it for typical jsonb workloads.
The both-can-work cases are where the EXPLAIN ANALYZE comparison earns its weight. Build both indexes on a copy of the table, run representative queries, compare actual times. The qualitative rules above predict the answer most of the time but not all of the time.
The size and maintenance cost
GIN indexes are larger than equivalent GiST indexes in most workloads, often by 2-4x. The posting-list architecture stores more metadata than the tree architecture. For workloads where index size is a binding constraint (shared_buffers can hold only the working set), GiST may win on cache-fit even when GIN would win on raw access time.
GIN indexes need periodic VACUUM more than GiST indexes do. The posting lists accumulate dead entries from updated and deleted rows; VACUUM reclaims this space. Without regular VACUUM, GIN indexes grow over time even on stable workloads. Autovacuum handles this in most installations but the dependence is worth knowing.
The pg_stat_user_indexes idx_scan column tells you whether your index is being used. The pg_relation_size reading tells you how much disk it costs. The combination reveals whether your index choice is paying back.
Three patterns that fail
The first is creating both GIN and GiST indexes on the same column, hoping the planner will pick the right one. The planner can pick correctly, but you are paying double the maintenance cost. Pick one and drop the other after verification.
The second is using GIN on small tables where B-tree on a normalized join would be faster. Containment queries against an EAV table look like they need GIN until you realize a JOIN against a normalized tag table with a B-tree index is faster and simpler.
The third is using GiST on full-text indexes because it was the example in a tutorial. Production full-text search needs GIN; the GiST example was demonstrating the operator class, not recommending it.
What this looks like in our products
We do not currently use GIN or GiST in production. Our four products (DocuMint, CronPing, FlagBit, WebhookVault) use SQLite which has neither index type natively; queries that would benefit from containment indexes either use FTS5 for text or do application-side filtering after a B-tree-indexed primary lookup.
The Postgres migration plan includes GIN for WebhookVault payload search (jsonb containment on captured request bodies), GIN for FlagBit targeting rule values (array containment on rule attribute lists), and GiST for CronPing maintenance window range queries (date-range overlap for monitor schedule conflicts). The choices map to the canonical rules: documents-with-many-values to GIN, range-or-geometric to GiST.
The deeper observation is that picking between GIN and GiST is one of the small operational competencies that distinguishes Postgres operators from Postgres users. The query plans look similar in EXPLAIN output. The cost differences only appear at scale. Knowing which one to reach for first saves the iteration cycle of building the wrong index and rebuilding it after the workload demonstrates the mistake.
Read more essays and technical writing at anethoth.com — a notebook on databases, distributed systems, biology, and the engineering that holds the world together.