Postgres pg_stat_archiver: Monitoring WAL Archiving Before Your Replicas Fall Behind

Your Postgres cluster is archiving WAL. You set up the archive command, the directory exists, and nothing is on fire. That's not the same as archiving working.

pg_stat_archiver is the view that tells you whether your archiver is actually keeping up. It's been in Postgres since version 9.4 and most teams never look at it until something breaks.

What the view contains

The columns that matter:

archived_count: Total WAL files successfully archived since the last stats reset
last_archived_wal: Name of the most recently archived WAL file
last_archived_time: Timestamp of that successful archive
failed_count: Total archive failures since stats reset
last_failed_wal: Name of the WAL file that last failed to archive
last_failed_time: Timestamp of that failure
stats_reset: When the counters were last reset

The query to run first:

SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time,
  now() - last_archived_time AS time_since_last_archive
FROM pg_stat_archiver;

What a healthy archiver looks like

failed_count is zero. time_since_last_archive is low — under a minute on a busy system, under 10 minutes on a quiet one. last_failed_wal is null.

If failed_count is nonzero and last_failed_time is recent, your archive command is failing. Postgres will keep retrying, but the WAL files are accumulating in pg_wal/ on the primary. Left unchecked, you'll fill the disk.

The lag calculation that matters

The archiver processes WAL files in order. To see how far behind it is, compare the current WAL position against the last archived file:

SELECT
  pg_walfile_name(pg_current_wal_lsn()) AS current_wal,
  last_archived_wal,
  pg_wal_lsn_diff(
    pg_current_wal_lsn(),
    pg_lsn(split_part(last_archived_wal, '.', 1) || '/' || split_part(last_archived_wal, '.', 2) || '000000')
  ) AS lag_bytes
FROM pg_stat_archiver;

A few hundred megabytes of lag is normal on a write-heavy system. Several gigabytes means the archiver is falling behind faster than it's catching up — investigate the archive command's throughput and destination I/O.

Archive failures and what they tell you

The last_failed_wal column is more useful than the count. It tells you the specific file that's failing, which tells you when the failure started. If last_failed_wal and last_archived_wal are the same file, the archiver got through the failure and kept going — Postgres retries indefinitely. If they're different files and last_failed_time is recent, the archiver is stuck.

Common failure causes:

Archive destination out of space or unreachable
Archive command permission failure
SSH key rotation that wasn't applied to the archive command
S3 credential expiry on an archive_command using aws s3 cp

Run archive_command manually from the Postgres OS user to reproduce the failure with visible error output.

Alerting thresholds

Three signals worth monitoring:

failed_count increasing (not just nonzero — it increases when archiving is failing actively)
now() - last_archived_time > interval '15 minutes' on a write-active system
now() - last_archived_time > interval '2 hours' on any system (silence means either nothing to archive or a hung archiver)

The silence case matters. A replica using restore_command to replay archived WAL will fall behind if the primary stops archiving. You won't see this in pg_stat_archiver — you'll see it in pg_stat_replication as growing lag.

What pg_stat_archiver does not show

It does not show per-file archive times. You can't compute archiver throughput from this view alone — combine with WAL generation rate from pg_stat_wal.wal_bytes to estimate whether the archiver is keeping pace.

It does not show replication slot WAL retention. A slot holding back WAL removal is a separate concern — check pg_replication_slots and pg_replication_slots.wal_status for that.

It does not tell you whether archived files are restorable. Archive a file doesn't mean the file is valid. Run pg_verifybackup on your base backup and spot-check WAL files by attempting a restore in a separate environment. pg_stat_archiver only knows the archive command exited 0.

Resetting counters

Counters persist until explicitly reset:

SELECT pg_stat_reset_shared('archiver');

Reset after resolving a failure so you can watch for new failures cleanly. Don't reset as a substitute for fixing the underlying problem — you'll lose the history of when the failure started.

Building something? builds.anethoth.com — public build dossiers for software projects in progress.