engineering

Postgres GIN Indexes for JSONB: How to Search Inside JSON Without Killing Performance

JSONB columns let you store flexible data, but the moment you need to search inside them, the default B-tree index is useless. GIN indexes solve the search problem at a cost most teams underestimate. The trade-offs deserve the same kind of attention you would give a schema migration.

Anethoth

20 May 2026 — 5 min read

JSONB columns are a Postgres feature that gets used long before its costs get understood. A team adds a metadata JSONB column to a table to avoid a schema migration. Six months later, the query WHERE metadata @> '{"plan": "pro"}' shows up in pg_stat_statements as the slowest query in the database, scanning the entire table on every call. The fix is a GIN index. The follow-up question, almost never asked at the same time, is what the index actually costs.

Why B-tree does not help

A standard B-tree index on a column lets the planner answer equality and range queries in logarithmic time. The index entries are ordered by the column value, and the planner navigates the tree to the matching range.

JSONB does not have a meaningful order in the sense that B-tree needs. Two JSONB values can be unequal, but the ordering between them is implementation-defined (Postgres uses a specific comparison rule, but it does not match the queries you actually want to run). A B-tree index on a JSONB column will be built and can answer queries like WHERE metadata = '{...exact JSON...}', but it cannot answer the queries that justify having JSONB in the first place: containment, key existence, path lookups, full-text within values.

For any query that asks "does this JSONB document contain this key or value," the planner falls back to a sequential scan, evaluating each row's JSONB content against the predicate. On a million-row table, the difference between an indexed lookup and a sequential scan is the difference between a millisecond and several seconds. On a billion-row table it is the difference between operational and not.

What GIN does instead

GIN stands for Generalized Inverted Index. It is the same family of data structure that backs full-text search and array indexing. The structure is an inverted index: for each distinct value (or key, or path) that appears in any indexed row, the index maintains a posting list of which rows contain it.

For JSONB, the default GIN opclass (called jsonb_ops) indexes every key and every value at every depth of every JSONB document in the table. When you query WHERE metadata @> '{"plan": "pro"}', the planner consults the posting list for the key "plan" and the value "pro", intersects them, and gets a list of candidate rows to recheck. The recheck step is necessary because the GIN index records that the key and value both exist somewhere in the document, not that they exist in the same key-value pair.

An alternative opclass, jsonb_path_ops, indexes paths rather than individual keys and values. It is smaller (because it does not separately track keys that appear in multiple paths) and faster for the containment query @>, but it does not support key-existence queries (?, ?|, ?&) at all. The trade-off is workload-dependent. If you only use @>, use jsonb_path_ops. If you need key-existence, use the default jsonb_ops.

The write cost

GIN indexes are dramatically more expensive to update than B-tree indexes. The reasons compound: every distinct value in the inserted or updated JSONB document requires a posting-list update, every posting-list update is a separate index page write, and the index pages are sized for the worst case rather than the typical case.

The naive Postgres GIN implementation would touch dozens of pages per insert. The actual implementation uses a feature called fastupdate, which buffers insert operations in a pending list and merges them into the main index in batches during autovacuum or when the pending list grows. The pending list trades read performance (because queries have to consult both the main index and the pending list) for write throughput.

The fastupdate setting is per-index and defaults to on. The recommendation for write-heavy workloads is to leave it on. The recommendation for read-heavy workloads with infrequent writes is to turn it off (WITH (fastupdate = off)) so that the pending list does not grow and slow down reads. The gin_pending_list_limit server setting controls when the pending list is force-merged.

Even with fastupdate, GIN index writes are several times more expensive than B-tree writes. A high-throughput insert workload on a table with a GIN index will spend a noticeable fraction of its time in index maintenance. The relevant metric is the autovacuum work spent on the index and the WAL volume generated by index updates.

The disk cost

GIN indexes are typically 2-5x larger than the B-tree equivalent on the same data, because the inverted index stores a posting list per distinct value rather than one entry per row. For JSONB columns with high-cardinality content (many distinct values), the ratio can be worse.

The size matters for two reasons. First, large indexes are less likely to fit in shared_buffers, which means more disk I/O for queries that consult them. Second, large indexes take longer to vacuum and longer to rebuild via REINDEX. The standard advice is to monitor the relative size of the GIN index versus the table and consider whether the index is justified.

The diagnostic query for index size is straightforward: SELECT pg_size_pretty(pg_relation_size('idx_name')); compared against pg_size_pretty(pg_relation_size('table_name'));. If the GIN index is larger than the table, it is probably indexing too much. Either switch to jsonb_path_ops, or extract the frequently-queried fields to dedicated columns with B-tree indexes.

The query patterns GIN supports

The GIN index on a JSONB column with the default jsonb_ops opclass supports several query patterns. Containment with @> is the most common: WHERE metadata @> '{"plan": "pro", "active": true}' finds rows where the JSONB document contains all the specified key-value pairs. Key existence with ?: WHERE metadata ? 'plan' finds rows where the document has a top-level "plan" key. Key existence in array form with ?| and ?&: WHERE metadata ?| array['plan', 'tier'] finds rows where the document has either of the specified keys.

The patterns GIN does not directly support include path queries (metadata->'preferences'->>'theme' = 'dark') and value-only existence without a key context. For path queries, the workaround is either to extract the path to a dedicated column with a B-tree index or to use a functional GIN index on the specific path: CREATE INDEX ON table USING GIN ((metadata->'preferences'));. The functional index is smaller than a full-document GIN index and faster for the specific path, but it only helps the specific path.

For path equality on a single value, the right answer is almost always a dedicated B-tree index on a generated column or expression: CREATE INDEX ON table ((metadata->>'plan'));. The expression index is a B-tree on the extracted value and behaves identically to a regular B-tree index for that one path. It is the right choice when one or two paths are queried frequently and the rest of the document is opaque.

The composite case

A common pattern is "find rows where the JSONB document contains X and the regular column equals Y." The naive approach is a single index combining both, but GIN indexes do not combine well with B-tree columns in the same index. The right answer is two separate indexes (one GIN, one B-tree) and let the planner combine them via bitmap index scan, which Postgres does automatically when the cost model favors it.

The diagnostic is to run EXPLAIN on the query and confirm the plan includes "BitmapAnd" combining the two index scans. If the plan only uses one of the indexes, the other one is not contributing and may be removable. If neither index is used and the query is doing a sequential scan, the predicate is probably not sargable in a way the planner can match to either index.

What this looks like across our four products

Our four products (DocuMint for PDF invoice generation, CronPing for cron job monitoring, FlagBit for feature flags, and WebhookVault for webhook debugging) are SQLite-based at current scale. SQLite has a JSON1 extension that supports JSON path expressions but does not have a GIN-equivalent index; the path-extraction pattern with expression indexes is the SQLite-equivalent strategy. For the Postgres migration plan, the JSONB columns in WebhookVault (for webhook payload storage) and FlagBit (for flag rule configurations) are the candidates for GIN indexing; the others use mostly typed columns.

The deeper observation is that JSONB plus GIN is one of those Postgres features that solves a real problem (flexible-schema search) at a cost that is invisible until the production load hits. The teams that get the best results out of it are the ones that understand the cost model before they reach for it, rather than the ones that add the column first and discover the cost during an incident. The pattern of "powerful database feature with non-obvious operational cost" applies equally to logical replication slots, materialized views, full-text search, and dozens of others; JSONB plus GIN is simply one of the more commonly-encountered examples.

Postgres GIN Indexes for JSONB: How to Search Inside JSON Without Killing Performance

Anethoth

Why B-tree does not help

What GIN does instead

The write cost

The disk cost

The query patterns GIN supports

The composite case

What this looks like across our four products

Read more

How Manatees Sense Currents: The Strange Tactile Engineering of Hydrodynamic Vibrissae

The Forgotten History of the Steam Locomotive: How the Iron Horse Compressed Geography

Postgres pg_class and pg_attribute: Reading the System Catalogs Directly

Designing API Webhook Delivery Receipts: The Audit Trail Customers Build Reports From