StackPostgreSQL 14+PersonaBackend engineer storing semi-structured dataProblemJSONB documents are stored but queries scan every row
A team stores user preferences in a JSONB column. The column holds data. Nobody indexes it. Every query does a sequential scan. The table grows. The queries slow down. This is the default outcome when JSONB is used as a dump rather than a queryable structure.
JSONB is not JSON. It stores a binary representation, not raw text. That makes it indexable. The question is whether you use the index.
The operators that matter
JSONB has two distinct operator families.
Path extraction: -> returns a JSONB value, ->> returns text. data->'user'->>'email' extracts the email field as a string. These let you filter by value but do not use a GIN index.
Containment: @> asks whether the left side contains the right side. data @> '{"tier":"premium"}' finds all rows where the JSONB document contains that key-value pair. This operator is what a GIN index is built for.
-- Path extraction (no index unless you add a functional index)
SELECT * FROM users WHERE data->>'tier' = 'premium';
-- Containment (uses GIN index)
SELECT * FROM users WHERE data @> '{"tier": "premium"}';
-- Array containment
SELECT * FROM events WHERE tags @> '["billing"]';
-- Path existence
SELECT * FROM users WHERE data ? 'legacy_id';
The GIN index
A GIN index on a JSONB column enables fast containment queries:
CREATE INDEX CONCURRENTLY idx_users_data_gin ON users USING GIN (data);
This covers @>, ?, ?|, and ?& operators. It does not help with -> path extraction unless you add a separate functional index.
The index is larger than a B-tree because it indexes every key and value in every document. On a column with high cardinality and frequent writes, GIN can slow writes noticeably. The fastupdate parameter (default on) defers updates to a pending list and flushes in batches — it helps write throughput but can cause spikes on the flush.
jsonb_path_ops for containment-only workloads
If you only use @> and never use ? or ?|, use jsonb_path_ops:
CREATE INDEX CONCURRENTLY idx_users_data_path
ON users USING GIN (data jsonb_path_ops);
This index is smaller and faster for containment queries because it hashes the full path rather than indexing each key separately. You lose the ? (key exists) operator. For most analytics workloads, that is a good trade.
Iterating over documents
jsonb_each expands a JSONB object into rows of key-value pairs. jsonb_array_elements expands an array. These are useful for queries that need to aggregate across document structure:
-- Count events by tag across all rows
SELECT tag, count(*)
FROM events, jsonb_array_elements_text(tags) AS t(tag)
GROUP BY tag
ORDER BY count(*) DESC;
This will scan the full table without an index. Use it for analytics queries on small tables or with a filter that narrows the row set first.
When JSONB is not the answer
JSONB is useful for genuinely variable structure: user-defined metadata, plugin configuration, third-party webhook payloads. It is misused when the structure is actually fixed and you are just avoiding schema migrations.
If you query the same three fields in every WHERE clause, put them in columns. A functional index on (data->>'user_id') works, but it is slower to write, harder to maintain, and loses type safety. The JSONB column exists to handle the parts of the document that vary by row. The fixed parts belong in the schema.
Building something? builds.anethoth.com is a public build ledger — prove your product is really being built with a free dossier.