Postgres Generated Columns: When Computed-Stored Beats Application Logic
Most teams reach for a denormalized column the same way every time. A new query needs to filter by lowercase email, or by month-of-creation, or by full-name-concatenated. The team adds a column, writes a backfill, updates the application to populate it on every insert and update, and adds an index. Six months later, an engineer adds a new code path that updates the underlying columns but forgets the denormalized one. The bug is discovered when a customer reports that search no longer finds them.
Generated columns solve this. They are columns whose values are computed from other columns, automatically, by the database itself. Postgres has had stored generated columns since version 12 in 2019, and virtual (computed-on-read) generated columns since version 18 in 2025. They eliminate the out-of-sync failure mode by removing the application's responsibility for maintaining the derived value. We use generated columns across DocuMint, CronPing, FlagBit, and WebhookVault, and they are one of the schema features whose value compounds over time.
The basic syntax and what it gives you
A stored generated column looks like this:
CREATE TABLE customers (
id BIGSERIAL PRIMARY KEY,
email TEXT NOT NULL,
email_lower TEXT GENERATED ALWAYS AS (lower(email)) STORED
);
CREATE INDEX idx_customers_email_lower ON customers (email_lower);The application inserts and updates the email column normally. The email_lower column is computed by Postgres on every write, stored alongside the row, and available for indexing and querying. The application does not know it exists from a write perspective; from a read perspective, it is a normal column.
The STORED keyword is currently required in Postgres versions before 18. Stored generated columns occupy disk space and are computed at write time. Virtual generated columns are computed at read time and store nothing extra; they are useful when the computation is cheap and the column is rarely accessed, but they cannot be indexed in current Postgres versions. For most production use cases, stored is the right default.
Cases where generated columns shine
The pattern is most valuable when the derived value needs to be indexed. Without generated columns, you have three options for indexing a derived value: a denormalized column maintained by the application (fragile), an expression index like CREATE INDEX ON customers (lower(email)) (works but has limitations), or doing the derivation on every read query (slow). Generated columns combine the indexability of denormalized columns with the always-correct guarantee of expression indexes.
The first case is normalization for search. Email addresses, usernames, and other identifiers often need case-insensitive uniqueness or case-insensitive search. A generated email_lower column with a unique index enforces both. The application can use a normal WHERE email_lower = lower($1) and get an index scan, and the constraint cannot be bypassed by writing through a code path that forgot to normalize.
The second case is denormalized derived values for filtering. A FlagBit table that tracks flag evaluations might want to filter by date_trunc('day', evaluated_at) for daily reporting. A generated evaluated_date column with an index makes this a fast equality query instead of a slow scan with function application. The same pattern works for month, week, hour-of-day, day-of-week, and any other time-bucket grouping.
The third case is composite keys with normalized representations. A WebhookVault table that deduplicates events by (source, external_id) can have a generated dedup_key TEXT GENERATED ALWAYS AS (source || ':' || external_id) STORED with a unique index. The query is a single equality lookup, the constraint is enforced at the database level, and the format is unambiguous.
The fourth case is computed flags for partial indexes. A CronPing monitors table might want a fast lookup of monitors that are both active and overdue. A generated is_overdue BOOLEAN GENERATED ALWAYS AS (last_ping_at < now() - schedule_interval) STORED would let you create a partial index WHERE is_overdue AND is_active. The wrinkle here is that now() is not allowed in generated columns because it is not deterministic; this case actually requires a different pattern, which we will cover below.
The restrictions and why they exist
Generated columns have several restrictions that catch teams by surprise. The expression must be immutable: only deterministic functions of the row's other columns. now(), random(), current_user, and anything that reads from other tables are all disallowed. This restriction exists because the database needs to be able to recompute the column when the underlying columns change, and to know that two rows with the same input columns have the same generated value. A column that depended on the current time would have different values depending on when it was computed.
The expression cannot reference other generated columns. This restriction is purely architectural and may be relaxed in future Postgres versions, but for now you cannot chain generated columns. The workaround is to write the chained expression inline.
The expression cannot reference columns of other tables. Generated columns are strictly row-local. Cross-table derivations have to be done either with materialized views or with application-maintained denormalized columns.
The column cannot be referenced as part of a foreign key or as the target of an UPDATE. The first restriction is annoying but rarely binding. The second is more confusing: applications that try to write to a generated column will get an error. The application has to write to the underlying columns and let Postgres compute the generated value.
The not-quite-generated patterns
For the "overdue monitor" case above, where the derivation depends on the current time, the right pattern is not a generated column but an indexed expression. The query SELECT * FROM monitors WHERE last_ping_at + schedule_interval < now() can use an expression index on (last_ping_at + schedule_interval): CREATE INDEX ON monitors ((last_ping_at + schedule_interval)). The index stores the deadline, and the query becomes a range scan against now().
For derivations that genuinely need to read other tables, the right pattern is either a materialized view (refreshed periodically) or a denormalized column maintained by a trigger or by the application. Each of these is more failure-prone than a generated column, which is exactly why generated columns are so valuable for the cases where they apply.
Performance considerations
Stored generated columns cost something on every write. The expression must be evaluated, and the value must be written to disk. For cheap expressions like lower(email) or string concatenation, the cost is unmeasurable. For expensive expressions like complex JSON path extraction or text-search vector construction, the cost can be significant on write-heavy tables.
The flip side: reads of the generated column are free. Without a generated column, every query that needs the derived value pays the computation cost on every read. A generated column trades one write-time computation for many read-time savings, and on workloads with read-write ratios greater than 1:1 (which is most workloads), the trade is unambiguously favorable.
The write cost is paid in two places: the actual computation, and the WAL volume from the larger row. Large generated columns (full-text search vectors, expanded JSON structures) can substantially increase WAL volume, with downstream effects on replication lag and backup size. This is rarely a reason to avoid generated columns but worth measuring on high-throughput tables.
Migration patterns
Adding a generated column to an existing table is a metadata-only operation in Postgres 14 and later, the same way that adding a column with a constant default is metadata-only. Adding a generated column with an expression that depends on other columns is also metadata-only at ALTER TABLE time, but Postgres has to rewrite the table to populate the new column. On large tables, this is a long operation that holds an ACCESS EXCLUSIVE lock for the duration. The workaround is the same as for any large table rewrite: do it during a maintenance window, or do it via a swap-tables pattern with a trigger to keep both copies in sync.
Removing a generated column is fast and safe: it is just a DROP COLUMN. Changing the expression is not directly supported; you have to drop the column and recreate it, which is again a large operation on large tables.
The most subtle migration consideration is that generated columns can break logical replication setups that do not include them in the publication. Logical replication publishes the data that the application sees, which includes the regular columns but not the generated ones. The subscriber side recomputes the generated columns from the underlying columns, which works as long as both sides have the same generation expression. Schema drift between publisher and subscriber breaks this assumption silently.
What this is and is not
The honest summary: generated columns are an underused feature that eliminates an entire class of denormalization bugs. They are the right answer whenever the derived value needs to be indexed, the derivation is deterministic and row-local, and the team would otherwise have to maintain the denormalization in the application.
They are not a substitute for thinking about the schema. A generated column that exists because the team did not want to refactor the underlying schema is technical debt with a database-shaped wrapper. A generated column that exists because the derived value is genuinely first-class data that needs to be queried and indexed is the right tool for the job. The discipline is the same as for any database feature: use it where it fits, and resist the temptation to use it as a shortcut for harder schema decisions.