Postgres pg_stat_wal: Measuring WAL Generation and the Cost of Every Write
WAL is the load-bearing primitive behind Postgres durability and replication. pg_stat_wal, added in Postgres 14, finally exposes how much WAL your workload generates and why. Reading the view changes how you think about write amplification, full-page writes, and replica bandwidth.
Every write to a Postgres table generates write-ahead log (WAL) bytes. Those bytes are the substrate of durability (replayable on crash), of replication (streamed to replicas), and of point-in-time recovery (archived for restore). The volume of WAL you generate determines the network bandwidth your replication needs, the storage cost of your archives, and the I/O ceiling of your write workload. Until Postgres 14, you had to infer WAL generation from indirect signals. The pg_stat_wal view added in 14 finally exposes the counters directly, and reading it changes how you think about the cost of writes.
What pg_stat_wal exposes
The view has one row, with cumulative counters since the last reset. The columns that matter are wal_records (number of WAL records generated), wal_fpi (number of full-page images written), wal_bytes (total WAL bytes), wal_buffers_full (number of times the WAL buffer filled and forced a flush), wal_write (number of times WAL was written from buffers to disk), wal_sync (number of WAL fsyncs), wal_write_time and wal_sync_time (cumulative milliseconds spent writing and syncing), and stats_reset (when counters were last cleared).
The cumulative-counter shape means raw values are not useful by themselves. The useful signals are deltas between two snapshots (rate per second) and ratios between counters (full-page-image share of total bytes, bytes per record, sync time per sync call).
The full-page-image trap
Postgres writes a full image of any page the first time it is modified after a checkpoint. This is the recovery mechanism for torn-page failures: if the system crashes mid-write, the WAL replay can reconstruct the page from the full image regardless of which partial write made it to disk. The cost is that the first modification of any page after a checkpoint costs an entire 8KB page in WAL, regardless of how small the actual change was.
This is the dominant cost in many write workloads. A workload that touches 10,000 unique pages between checkpoints generates roughly 80MB of full-page-image WAL on top of whatever the actual modifications cost. If checkpoints fire every 5 minutes, that is 80MB per 5 minutes regardless of how many or how few logical writes happened. The wal_fpi counter and the ratio of (wal_fpi * 8KB) to wal_bytes tell you what fraction of your WAL volume is full-page-images vs actual change records.
The remediation, when full-page-images dominate, is longer checkpoint intervals. Increasing checkpoint_timeout and max_wal_size lets more modifications hit the same page within a checkpoint cycle, amortizing the full-page-image cost across more writes. The trade-off is longer recovery time after a crash, since more WAL has to replay. The right setting is workload-dependent and is one of the things pg_stat_wal makes possible to tune empirically rather than by guess.
The wal_buffers_full diagnostic
When the WAL buffer fills before the next scheduled flush, a backend has to flush WAL synchronously before continuing. This stalls the writing backend and is visible as the wal_buffers_full counter incrementing. A non-zero growing rate on wal_buffers_full is a signal that wal_buffers (the configuration parameter) is undersized for your write rate. The default of -1 lets Postgres pick based on shared_buffers (1/32 of shared_buffers, capped at the WAL segment size). For write-heavy workloads, explicitly setting wal_buffers to 16MB or higher can eliminate the synchronous flush stalls.
The relationship to wal_sync_time matters: a workload with high wal_sync_time and high wal_buffers_full is paying both for inadequate buffering and for slow fsync. Increasing wal_buffers reduces the count of fsyncs; faster storage reduces the time per fsync; synchronous_commit=off eliminates the fsync wait at the cost of durability for the last few hundred milliseconds of committed transactions.
The replication bandwidth implication
WAL bytes are what gets streamed to replicas. A workload generating 500MB/s of WAL needs 500MB/s of network bandwidth to each streaming replica. The pg_stat_wal counters let you calculate this directly: delta(wal_bytes) over delta(time) gives the WAL generation rate, and that rate is the bandwidth requirement per replica. Logical replication has different bandwidth characteristics because the protocol sends decoded changes rather than raw WAL bytes, but the WAL-to-decode pipeline is what produces the changes, so the raw rate is still the relevant ceiling.
This calculation is the basis for capacity planning replica bandwidth. A workload generating 100GB of WAL per day needs to be able to stream that much per day, plus headroom for spikes. If the WAL generation pattern is bursty (heavy ETL jobs at scheduled times, for example), the peak bandwidth requirement can be 10x the average, and the network sizing needs to handle the peak.
The everyday diagnostic queries
Three queries should be in any operator's toolbox.
First, current WAL generation rate (calculate over time):
SELECT now() - stats_reset AS interval,
pg_size_pretty(wal_bytes) AS total_bytes,
round(wal_bytes / EXTRACT(EPOCH FROM (now() - stats_reset))) AS bytes_per_sec,
wal_records, wal_fpi,
round(100.0 * (wal_fpi * 8192.0) / NULLIF(wal_bytes, 0), 1) AS fpi_pct
FROM pg_stat_wal;Second, sync performance:
SELECT wal_sync, wal_sync_time,
round(wal_sync_time / NULLIF(wal_sync, 0), 3) AS avg_sync_ms,
wal_write, wal_buffers_full
FROM pg_stat_wal;Third, periodic sampling to track trends requires either external tooling or a cron job that snapshots the counters to a stats table for later analysis. The reset operation via pg_stat_reset_shared('wal') zeros the counters; use it sparingly because it discards baseline for delta calculations.
What pg_stat_wal does not show
The view does not show per-query, per-table, or per-database WAL attribution. Identifying which workload component generated which WAL bytes requires either pg_stat_statements with the wal_bytes column (added in Postgres 13), or analyzing WAL files directly with pg_waldump. The pg_stat_statements integration is the practical answer for attribution; pg_waldump is the deep-diagnostic tool for understanding specific WAL records.
The view also does not show replication-related WAL state (slot lag, replica position). Those live in pg_stat_replication on the publisher side and pg_stat_subscription on the subscriber side, both of which were covered in earlier posts in this series.
Our position
Our four products (DocuMint for PDF invoice generation, CronPing for cron job monitoring, FlagBit for feature flags, and WebhookVault for webhook debugging) all run on SQLite with WAL mode, which has its own WAL semantics distinct from Postgres but addresses similar problems. Our planned Postgres migration includes WAL monitoring as a Day 1 dashboard requirement because WAL generation rate is the metric that bounds replication, archiving, and recovery decisions.
Deeper observation
WAL is one of the operational details that becomes a load-bearing performance and capacity constraint as a Postgres deployment grows. At small scale, WAL is invisible. At medium scale, WAL becomes a configuration concern (wal_buffers, checkpoint settings). At large scale, WAL becomes a capacity-planning concern (replica bandwidth, archive storage, recovery time). pg_stat_wal makes the metric visible enough to tune empirically rather than by guess, which is the difference between operating Postgres at scale and operating it past the point where the defaults stop fitting.