Postgres Transaction ID Wraparound: How to Prevent the Catastrophic Shutdown
Most Postgres failure modes are gradual. Bloat accumulates and queries get slower; a forgotten replication slot fills the disk over weeks; a missing index turns a five-millisecond query into a five-second one. Transaction ID wraparound is the rare exception. The database runs at full speed up to the moment it refuses to start, at which point recovery requires a single-user mode VACUUM that can take hours or days depending on table size. The failure mode is documented prominently in the Postgres manual, autovacuum is supposed to prevent it, and yet teams still hit it. The pattern that produces a wraparound shutdown in 2026 is consistent enough to be worth understanding.
What transaction IDs are and why they wrap
Postgres assigns every transaction a 32-bit identifier (the XID). Each row in every table carries the XID of the transaction that inserted it and, if the row has been modified, the XID of the transaction that deleted or updated it. MVCC uses these XIDs to determine which rows are visible to which transactions; a row is visible to a transaction if its inserting XID is older than the current transaction's snapshot and its deleting XID either does not exist or is newer than the snapshot.
The 32-bit counter has approximately 4.3 billion possible values, but Postgres uses modular arithmetic to interpret the counter circularly: half the space (about 2 billion XIDs) is considered "in the past" and the other half is considered "in the future" relative to the current value. When the counter passes 2 billion since a row was written, that row would appear to be in the future and would become invisible to all current transactions, which would silently lose data. To prevent this, Postgres requires that rows older than a configurable threshold be "frozen" before the counter advances far enough to make them look future-dated.
Freezing is a VACUUM operation that replaces the inserting XID on old rows with a special FrozenXID marker that always compares as older than every current XID. Once frozen, a row is safe forever from wraparound concerns. The autovacuum daemon runs vacuum freeze automatically when tables approach the threshold, and this is supposed to happen invisibly without operator intervention. The failure mode is when freezing falls behind and the counter approaches the safety threshold faster than autovacuum can catch up.
The chronology of a wraparound shutdown
The XID counter advances by one for every transaction the database performs. A moderately busy database might consume 100 million XIDs per day; a very busy database might consume a billion or more. At that rate, the 2 billion threshold can be reached in days to weeks if freezing is not keeping up. The chronology of a wraparound incident is consistent: weeks before the shutdown, the oldest unfrozen XID age starts climbing without coming back down; days before the shutdown, autovacuum becomes aggressive about freezing but cannot catch up; hours before the shutdown, Postgres starts emitting warnings to the log about the wraparound threshold; minutes before the shutdown, the database refuses to start new transactions and goes into safety mode.
The shutdown is conservative: Postgres refuses to perform any new writes (and eventually any reads of unfrozen rows) rather than risk silent data corruption. The recovery is single-user mode: stop the database, restart with the --single flag, and run VACUUM against the database. The recovery operation reads every unfrozen row in every table and updates the visibility metadata. The duration depends on database size; for terabyte-scale databases, recovery can take days during which the database is unavailable for production use.
Why autovacuum falls behind
Autovacuum's freeze logic is governed by autovacuum_freeze_max_age (default 200 million XIDs) and vacuum_freeze_table_age (default 150 million). When a table's oldest XID is more than autovacuum_freeze_table_age XIDs old, the next autovacuum on that table promotes itself to a full-table freeze. When the oldest XID exceeds autovacuum_freeze_max_age, autovacuum runs the freeze even on tables that would otherwise not qualify for vacuuming, and it will not be deterred by the usual cost-throttling limits.
The standard reasons autovacuum falls behind are familiar. Long-running transactions hold back the xmin horizon and prevent VACUUM from cleaning up rows whose xmax is newer than the oldest active transaction; the same mechanism prevents freezing from making progress on those rows. A connection that runs a 24-hour analytical query, or an idle-in-transaction connection that nobody noticed, can completely block freeze progress across the whole database. Aggressive write workloads that consume XIDs faster than VACUUM can scan tables also produce backlog. Per-table autovacuum overrides that disable freezing on specific tables (a configuration mistake that occasionally appears) are the rarer cause.
The cost-throttling configuration matters too. The defaults for vacuum_cost_limit and vacuum_cost_delay were chosen for spinning-rust storage in the 2000s; on modern NVMe storage they are wildly conservative, sometimes producing freeze rates of a few megabytes per second when the storage could comfortably handle hundreds. A database with terabyte tables and default cost-throttling cannot complete a freeze in any reasonable time.
The monitoring queries that catch it early
The single most important query for catching wraparound problems is the oldest-XID-age query against pg_database: SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC;. The output shows how many XIDs each database is from the freeze horizon. The default autovacuum_freeze_max_age of 200 million is the autovacuum threshold; the danger zone is anything above 1 billion; the catastrophic zone is anything above 1.5 billion. Alerts should fire well below the catastrophic zone, ideally around 500 million.
The per-table version is SELECT schemaname, relname, age(relfrozenxid) FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace WHERE relkind = 'r' ORDER BY age(relfrozenxid) DESC LIMIT 20; which identifies the specific tables that are oldest. The largest oldest table is usually the table that autovacuum is failing to freeze; investigating why (long-running transactions, blocked autovacuum, cost-throttling) is the right next step.
The autovacuum activity query against pg_stat_progress_vacuum shows what autovacuum is currently doing and how far through the current operation it is. If autovacuum is constantly running on the same table without making progress, the table is large enough that the cost-throttling configuration is the bottleneck.
The configuration changes that prevent it
The most important configuration change is realistic cost-throttling. On NVMe storage, autovacuum_vacuum_cost_limit can usually be increased to 2000 or higher (from default 200) without saturating the disk; autovacuum_vacuum_cost_delay can be reduced to 2ms or 0ms (from default 2ms). The combined effect is to let autovacuum run at roughly the disk's read bandwidth, which is the right tradeoff on storage that has bandwidth to spare.
The second change is more parallelism via autovacuum_max_workers (default 3). Increasing to 5-8 workers helps when many tables need vacuuming concurrently; it is not helpful when one giant table is the bottleneck (the per-table vacuum is single-threaded except for index vacuuming).
The third change is per-table tuning for tables that consume XIDs disproportionately. A hot table with billions of rows and frequent updates should have aggressive autovacuum settings: ALTER TABLE big_table SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_freeze_min_age = 100000, autovacuum_freeze_table_age = 100000000);. The settings cause more frequent freezing on the specific table without affecting global autovacuum behavior.
The idle-in-transaction safety net
The idle_in_transaction_session_timeout setting (defaulting to 0 = disabled, but should be set to something like 30 seconds in production) is the most underused freeze-related setting. An idle-in-transaction connection holds an open transaction snapshot that prevents freeze progress; the timeout terminates such connections automatically. Setting it to a reasonable value eliminates one of the most common ways for autovacuum to silently fall behind.
The recovery operationally
If a wraparound shutdown happens despite the precautions, the recovery is mechanical but slow. Stop the database. Restart in single-user mode: postgres --single -D /var/lib/postgresql/data dbname. Run VACUUM; at the prompt; this is unthrottled and runs as fast as the disk allows. For a terabyte-scale database the operation can take hours; for multi-terabyte databases it can take days. There is no way to make the database available for production use during this time; the safety mode is intentionally restrictive.
The retrospective on a wraparound incident is usually straightforward: the monitoring was inadequate to alert before the catastrophic zone, or the alerts existed but were ignored. The fix is usually the same combination of configuration changes (faster autovacuum, idle-in-transaction timeout, per-table tuning) that would have prevented the incident.
Across our four products
We run DocuMint, CronPing, FlagBit, and WebhookVault on SQLite, which has no equivalent to XID wraparound; SQLite uses 64-bit ROWID values that will not wrap in any realistic operational lifetime. The eventual Postgres migration plan includes the standard wraparound prevention checklist as a launch requirement: realistic cost-throttling, idle-in-transaction timeout, monitoring on age(datfrozenxid) with alerts at 500 million, per-table tuning on any hot table.
The deeper observation is that the categories of database failure that take systems completely offline are usually the ones where the safety mechanism is conservative enough to refuse work rather than risk corruption. Transaction ID wraparound is one of these; replication slot disk-fill is another. The reliable way to avoid them is to monitor the underlying counters (XID age, slot WAL size) and alert well before the safety mechanism activates. The instinct to ignore boring monitoring metrics on counters that are slowly increasing toward thresholds is exactly the instinct that produces these incidents.