Applies toPostgres 14+Viewpg_stat_walRiskDisk exhaustion from runaway WAL generation
Your database is writing to disk constantly. WAL — the write-ahead log — is how Postgres guarantees crash recovery and replication. Every INSERT, UPDATE, and DELETE produces WAL records. So does every checkpoint, every full-page image written after a checkpoint, and every sequence tick.
Before Postgres 14, measuring WAL generation rate required external tooling or sampling pg_current_wal_lsn() in a loop. Postgres 14 introduced pg_stat_wal, a single-row view that accumulates WAL statistics since the last pg_stat_reset_shared('wal').
The columns that matter
SELECT
wal_records,
wal_fpi,
wal_bytes,
wal_buffers_full,
stats_reset
FROM pg_stat_wal;
- wal_records: total WAL records generated. Each DML statement produces at least one.
- wal_fpi: full-page images written. This is usually your dominant WAL volume driver after checkpoints.
- wal_bytes: total bytes of WAL generated. The number your disk cares about.
- wal_buffers_full: how often WAL buffer flushes were triggered by a full buffer rather than a checkpoint. Non-zero values mean
wal_buffersis undersized. - stats_reset: when statistics were last reset. Important for interpreting rates.
Full-page images: the silent WAL amplifier
After every checkpoint, the first write to any page is written in full to WAL — not just the changed bytes. This is what full-page images (wal_fpi) are. On a busy database, wal_fpi can account for 70–90% of total wal_bytes.
The ratio wal_fpi / wal_records tells you how checkpoint-write-heavy your workload is. High ratios after checkpoint events are expected. Sustained high ratios between checkpoints suggest either very short checkpoint intervals or write patterns that touch many distinct pages.
Measuring rate with two snapshots
-- snapshot 1
SELECT wal_bytes AS bytes_start, now() AS t_start
FROM pg_stat_wal gset
-- wait 60 seconds
-- snapshot 2
SELECT
(wal_bytes - :bytes_start) / EXTRACT(EPOCH FROM (now() - :'t_start')) AS wal_bytes_per_sec
FROM pg_stat_wal;
On standard spinning disk, sustained rates above 50 MB/s warrant investigation. NVMe changes that number significantly, but the rate still matters for replication lag budgets and backup window sizing.
Prometheus/Grafana monitoring query
SELECT
wal_bytes,
wal_records,
wal_fpi,
wal_buffers_full,
EXTRACT(EPOCH FROM (now() - stats_reset)) AS seconds_since_reset
FROM pg_stat_wal;
Export this via your Postgres exporter. Graph rate(wal_bytes[5m]) in Prometheus. Alert on sustained rates that would fill your WAL partition within your incident response window.
Tuning levers this view informs
checkpoint_timeout: shorter checkpoints mean more frequent checkpoint boundaries, which means more full-page images. If wal_fpi is high relative to wal_records, consider lengthening checkpoint_timeout and raising max_wal_size to compensate.
wal_buffers: if wal_buffers_full is non-zero under normal load, increase wal_buffers. The default auto-size is usually fine for read-heavy workloads; write-heavy workloads often need it set explicitly to 64–256MB.
What pg_stat_wal does not show
It cannot tell you which tables are generating the most WAL. For that, you need pg_stat_statements to identify high-frequency write queries, then estimate per-statement WAL cost via EXPLAIN (WAL) added in Postgres 13.
It also does not break down WAL by replication slot consumer. Slot lag is a separate concern; use pg_replication_slots for that.
The view resets on pg_stat_reset_shared('wal') or server restart. Build your monitoring around the rate, not the cumulative total.
Building in public at builds.anethoth.com — public build dossiers for software projects in progress.