Why Your JSON Columns Make Migrations Impossible: The Schema-on-Read Trap

Column typeJSONB (PostgreSQL)Common patternFlexible feature flags, user settings, varying metadataThe trapSchema-on-read means migration-on-readWhen JSONB IS rightTruly polymorphic data you don't control

The pitch for JSONB columns is compelling: no migrations needed when you add features. Just add a key to your JSON payload and ship. No ALTER TABLE, no deploy coordination, no database downtime, no PR blocked on the migration review. The feature is live the moment you deploy the code.

Eighteen months later, you have a different problem. You have 47 different shapes of the same column across production rows, no way to know which rows have which fields without querying them individually, and application code that has accumulated twelve nested .get() calls for every field access because nobody can remember which rows might be missing which keys.

The migration was not eliminated. It was distributed across every read path in your application, forever.

What schema-on-read actually means

Relational databases enforce schema at write time. When a column is defined as NOT NULL INTEGER, the database rejects any write that doesn't provide a valid integer. When you add a column with ALTER TABLE, you specify a default value that fills existing rows immediately, or you accept NULLs for rows created before the migration. Either way, after the migration completes, every row has a consistent shape. Queries against that column don't need to handle the case where the column doesn't exist.

JSONB with a varying structure is schema-on-read. The database doesn't enforce anything about what keys exist or what types they have. That enforcement happens — must happen — in application code at read time. Every time you read from the column, you handle the possibility that any given key may or may not exist and may or may not be the type you expect.

This is fine when you write the first version of the code. You know the shapes that exist. But the code will be read and modified by people who weren't there for the first version, in contexts where the original intent isn't obvious, in three years when the codebase has accumulated layers of defensive code without anyone fully understanding why.

The backfill impossibility

Suppose you've been storing user preferences as JSONB for two years. Early on, the column looked like {"theme": "light", "notifications": true}. Later, you added locale, then timezone, then weekly_digest. Now you need to add email_frequency, and the correct default depends on whether the user previously had notifications: true and whether their timezone is set to a US timezone.

With a typed column, this is a migration with a CASE expression. With JSONB, it's a backfill script that you need to write, test, and run against production — because you can't express the logic in a DDL migration. The database doesn't know what's in the column. You have to tell it, row by row, in application code.

For millions of rows, this requires a maintenance window or a background job that takes days. For a typed column, ALTER TABLE with a computed default can often run as a metadata-only change that finishes in milliseconds. The pain of ALTER TABLE is front-loaded and visible; the pain of JSONB backfills is deferred and cumulative.

The query performance trap

JSONB has GIN indexes, and @> containment queries use them efficiently. If you need to find all rows where preferences @> '{"theme": "dark"}', a GIN index will help. This is the case JSONB is genuinely built for.

What GIN indexes don't help with:

Sorting: ORDER BY preferences->>'created_at' is a sequential scan. The database cannot use statistics about JSON key distribution for sort ordering.
Range queries: WHERE (preferences->>'age')::int > 30 requires a functional expression index at best, and the optimizer still can't use column statistics for cardinality estimation.
Joins: Joining on a JSON field requires casting and expression evaluation per row. The planner can't use foreign key statistics.
NULL handling: WHERE preferences->>'key' IS NULL matches both rows where the key exists with a null value and rows where the key doesn't exist at all. The semantics are different but the expression doesn't distinguish them.

As soon as you need to filter, sort, or join on a field that you're storing in JSONB, you've paid the performance cost of schema-on-read on every query. The GIN index that makes containment queries fast does nothing for the five other query patterns you'll develop over time.

The ALTER TABLE comparison

ALTER TABLE with a DEFAULT value is painful exactly once. In PostgreSQL 11+, adding a column with a non-NULL constant default is a metadata-only operation — the default is stored in the catalog, not written to existing rows immediately, and queries that access the new column for old rows get the default without reading it from disk. This is nearly free even for billion-row tables.

Adding a column with a complex default still requires a table rewrite (or a staged migration with an expand-and-backfill pattern), but the process is documented, understood, and has known tooling. You run the migration once, it succeeds or fails with a clear error, and afterwards the schema is correct. The application code that reads the column doesn't need to handle the pre-migration case because the database guarantees it doesn't exist anymore.

With JSONB, every deploy adds another case to handle. The application accumulates knowledge of every historical shape the column has ever had, distributed across defensive code that nobody documented because at the time it was obvious.

When JSONB is the right choice

JSONB is genuinely useful. The cases where it's right are specific:

Truly polymorphic data you don't control: Third-party webhook payloads with varying schemas, user-defined form fields where you genuinely can't predict the shape, plugin configuration that varies by plugin type. When the schema varies because the input varies and you don't control the input, schema-on-read is appropriate because there is no other option.

Audit and event logs: If you're storing event payloads where the shape intentionally varies by event type and you only ever read the entire payload back without filtering on individual fields, JSONB is fine. You're trading the payload as an opaque blob, not querying its structure.

Metadata overflow: If you have a well-typed core schema and genuinely optional metadata that you never filter or sort on — tags, custom fields that users define themselves, annotations — a JSONB overflow column next to your typed columns gives you the flexibility you need without corrupting the queryable core.

The hybrid pattern

The most defensible pattern is typed columns for everything you know, with a JSONB extra column for genuine overflow. You get the benefits of schema enforcement and query performance for your predictable fields, plus the flexibility of JSONB for genuinely unpredictable ones. The discipline is maintaining the distinction: when something moves from "we might need this occasionally" to "we filter and sort on this regularly," it graduates to a typed column.

The failure mode of the hybrid pattern is letting everything default to extra because it's easier than writing a migration. That requires a different kind of discipline — naming a specific engineer responsible for schema evolution, with a rule that any field accessed in more than two code paths graduates to a typed column.

The schema is still there

JSONB does not eliminate schema. The schema exists in every piece of application code that reads the column — in the .get() calls, the defensive checks, the type coercions, the comments explaining which rows might have which fields. It's implicit, distributed, and unenforced, which means it drifts silently over time and surprises you when you need to query it.

ALTER TABLE puts the schema where the database can enforce it. JSONB puts the schema where application developers can forget it.

Neither choice is universally wrong. But the cases where JSONB's flexibility outweighs the migration and query costs are narrower than most teams realize when they're shipping features fast and enjoying the absence of migration PRs.

Building something in public? builds.anethoth.com — public build dossiers for software projects in progress.