Postgres pg_stat_wal_receiver: Monitoring Your Standby Before It Falls Behind

Viewpg_stat_wal_receiverAvailable onStandby servers onlyIntroducedPostgreSQL 9.6Use forStandby-side replication lag monitoring

Most engineers who monitor Postgres replication watch pg_stat_replication on the primary. That view shows you what the primary knows: how far behind each standby is, whether it's streaming, what LSN it last confirmed. It's the right view for operator dashboards and alerting on the primary side.

But pg_stat_wal_receiver is different. It lives on the standby, shows you what the standby knows about its own connection to the primary, and captures failure modes that pg_stat_replication simply cannot see. When your standby loses connectivity and the primary's view shows a dropped row rather than a stale one, pg_stat_wal_receiver is the view that tells you the standby's story.

What the view shows

Run this on a standby server:

SELECT * FROM pg_stat_wal_receiver;

The output has one row (or zero rows if the standby is not connected). Key columns:

status: the current state of the WAL receiver process. Values are starting, streaming, restarting, and stopped. If you're not seeing streaming, something is wrong.
receive_start_lsn: the LSN where this receiver started receiving WAL. Changes on reconnect.
receive_start_tli: the timeline ID at receiver start.
written_lsn: the LSN through which WAL has been written to disk on the standby.
flushed_lsn: the LSN through which WAL has been durably flushed (fsync'd) on the standby.
received_tli: the timeline ID of the most recently received WAL.
last_msg_send_time: when the standby last sent a status message to the primary.
last_msg_receipt_time: when the standby last received a message from the primary.
latest_end_lsn: the LSN of the latest WAL location reported to the primary.
latest_end_time: when latest_end_lsn was last updated.
sender_host and sender_port: where the standby is connected.
conninfo: the connection string used (with password masked).

The standby vantage point vs the primary vantage point

pg_stat_replication on the primary shows the primary's view: it tracks what the primary has sent and what each standby has acknowledged. When a standby disconnects, the row disappears from pg_stat_replication entirely. You know the standby is gone, but you don't know why from the primary's perspective.

pg_stat_wal_receiver on the standby shows the standby's view: it persists even during reconnection attempts, captures the connection parameters being used, and lets you see the standby's own assessment of how stale it is. Different failure modes are visible from different vantage points:

Network partition visible from standby: status = 'restarting', last_msg_receipt_time going stale
Primary crash visible from standby: same signals plus sender_host potentially pointing to a failed node
Replication slot exhausted visible from primary: row exists with state = 'startup' repeatedly
Authentication failure visible from standby: status = 'restarting' in a tight loop, pg_log showing password errors

Detecting lag from the standby side

The most operationally useful lag calculation requires a cross-server query—comparing the standby's latest_end_lsn to pg_current_wal_lsn() on the primary:

-- Run on the standby, cross-referencing against primary LSN you fetch separately
SELECT
  latest_end_lsn,
  latest_end_time,
  now() - latest_end_time AS time_since_last_report,
  pg_wal_lsn_diff(
    '0/FEDCBA98'::pg_lsn,  -- substitute current primary LSN
    latest_end_lsn
  ) AS lag_bytes
FROM pg_stat_wal_receiver;

In practice, this means either querying both servers in your monitoring system and computing the diff, or relying on the primary's pg_stat_replication for lag bytes and using pg_stat_wal_receiver for connection health signals.

The more practical standby-only lag signal is written_lsn != flushed_lsn. When WAL has been written to the OS page cache but not yet flushed to disk, this divergence can indicate disk I/O pressure or an fsync queue backup:

SELECT
  status,
  written_lsn,
  flushed_lsn,
  pg_wal_lsn_diff(written_lsn, flushed_lsn) AS unflushed_bytes,
  last_msg_receipt_time,
  now() - last_msg_receipt_time AS staleness
FROM pg_stat_wal_receiver;

Alert patterns

last_msg_receipt_time going stale: This is your primary real-time signal. Under normal streaming replication, the standby receives keepalive messages from the primary every wal_receiver_timeout / 2 seconds (default: 30 seconds). If now() - last_msg_receipt_time exceeds 60 seconds, the connection is at risk. At 90 seconds, assume the connection is broken. Alert at 60 seconds, page at 90.

status != 'streaming': If the status is restarting, the WAL receiver is actively trying to reconnect. This is normal for a moment after primary restart but should not persist. If you see restarting for more than 2-3 minutes, investigate network connectivity and pg_hba.conf on the primary.

written_lsn != flushed_lsn: This write-behind condition indicates the standby is receiving WAL faster than it can flush it. Under normal conditions this resolves quickly. Persistent divergence suggests disk I/O saturation on the standby. This doesn't directly indicate lag but foreshadows it.

What pg_stat_wal_receiver does NOT show

This view only covers the WAL receive process. It does not tell you:

Replay lag on the standby: How far behind the standby's applied state is from its received WAL. For this, use pg_last_wal_replay_lsn() and pg_last_xact_replay_timestamp() on the standby. A standby can be fully caught up on receiving but still replaying old transactions.
Query conflicts: Long-running queries on a hot standby can delay WAL replay. Use pg_stat_replication on the primary (specifically replay_lag) and standby-side pg_stat_activity to identify conflicts.
Replication slot retention risk: Slots on the primary are tracked via pg_replication_slots on the primary, not here. A standby that falls far behind with a retained slot can fill the primary's disk.

The complete monitoring picture

Robust standby monitoring requires both vantage points:

On the primary: pg_stat_replication for per-standby lag in bytes (write/flush/replay lag), connection count, and sync state. Alert on replay_lag > 30 seconds for synchronous standbys, > 5 minutes for asynchronous.

On the standby: pg_stat_wal_receiver for connection health (staleness of last_msg_receipt_time, status), plus pg_last_xact_replay_timestamp() for the human-readable age of the last applied transaction.

-- Standby health summary — run on the standby
SELECT
  r.status,
  r.last_msg_receipt_time,
  now() - r.last_msg_receipt_time AS connection_staleness,
  r.written_lsn,
  r.flushed_lsn,
  pg_last_wal_replay_lsn() AS replay_lsn,
  now() - pg_last_xact_replay_timestamp() AS replay_lag
FROM pg_stat_wal_receiver r;

When the view is empty

If SELECT * FROM pg_stat_wal_receiver returns zero rows on a standby, one of three things is true:

The standby.signal file is missing or the primary_conninfo is not set (the server started in standalone mode, not standby mode)
The WAL receiver process crashed and has not restarted (check pg_log for errors)
You're running on the primary, not the standby

The absence of a row is not the same as a disconnected standby. A disconnected standby still has a WAL receiver process attempting to reconnect—its row shows status = 'restarting'. A missing row means the WAL receiver process itself does not exist.

Combine pg_stat_wal_receiver with pg_stat_replication on the primary and you have two independent views of the same replication stream. When they disagree, the disagreement itself is diagnostic.

Building something? builds.anethoth.com is a public build ledger for software projects — proof that something is really being built.