Postgres pg_stat_replication: Watching Standbys From the Primary
The view that tells you how far behind each replica is, in three different units, with the asymmetric reading discipline that makes the difference between catching problems early and finding out at failover.
One of the easiest production failure modes with replicated Postgres is the silent replica lag. The primary is healthy, the replica is healthy, replication is technically running, but the replica has been falling behind for hours because some bulk job on the primary is generating WAL faster than the replica can apply it. The view that lets you catch this before it matters is pg_stat_replication, queried on the primary, with a small reading discipline that the documentation does not emphasize as strongly as it should.
What the view shows
pg_stat_replication has one row per connected replica. The connection is open as long as the replica is receiving WAL, which means a disconnected replica produces no row at all rather than a row with stale data. This is correct behavior but easy to misread: the absence of a row for an expected replica is itself a critical signal.
The columns that matter day-to-day fall into three groups. The identity columns (pid, usename, application_name, client_addr) tell you which replica the row is about, with application_name as the load-bearing column because operators set it via the primary_conninfo connection string on each replica. The state column shows the replication phase: startup during initial connection, catchup while replaying accumulated WAL, streaming during normal operation. Anything other than streaming for an extended period is worth investigating.
The lag columns are the heart of the view. Postgres reports lag in three different units: byte lag (write_lag, flush_lag, replay_lag as durations in 14+; sent_lsn, write_lsn, flush_lsn, replay_lsn as positions in earlier versions), time lag (write_lag, flush_lag, replay_lag as time durations), and a separate sync_state column showing whether the replica is async, sync, or quorum.
The three units of lag
Byte lag tells you how many WAL bytes the primary has generated that the replica has not yet received, written, flushed, or replayed. The four positions correspond to four stages: sent_lsn is what the primary has shipped, write_lsn is what the replica has acknowledged receiving into its buffer, flush_lsn is what the replica has written to disk, replay_lsn is what the replica has actually applied to its data files.
The gap between flush_lsn and replay_lsn is the most common diagnostic signal: the replica is receiving and durably storing WAL but cannot apply it fast enough to its data files. This typically indicates the replica is single-threaded on apply (which all Postgres replicas are by default) while the primary has multiple writers generating WAL in parallel.
Time lag in the *_lag columns expresses the same information as a duration: how long ago on the primary was the WAL position the replica is currently at. This is the unit operators usually want, because byte counts are hard to interpret without knowing the WAL generation rate. A replay_lag of 2 minutes means the replica is showing data that is 2 minutes stale, regardless of whether that is 1MB or 10GB of WAL.
The sync_state column is independent of lag: it shows whether the replica is participating in synchronous replication. A sync replica with persistent lag indicates that primary commits are being blocked waiting for it, which is usually worse than the lag itself.
The diagnostic queries
The everyday query is short: pull one row per replica with the columns that matter for catching problems.
SELECT application_name, client_addr, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS bytes_behind,
replay_lag
FROM pg_stat_replication
ORDER BY bytes_behind DESC NULLS LAST;The pg_wal_lsn_diff function computes the byte gap between two LSNs, which is the most useful single-number summary of how far behind a replica is. The replay_lag column gives the time equivalent. Together they cover both the question of how much data the replica is missing and how stale that data is.
The alerting query is more selective: alert when bytes_behind exceeds some threshold (say, 100 MB) for any replica, or when replay_lag exceeds some duration (say, 30 seconds for a read-heavy replica or 5 minutes for an analytics replica). The right thresholds depend on what the replicas are used for: a replica that fronts user-facing reads needs sub-second lag, a replica that runs nightly reports can tolerate hours.
The missing-replica problem
The most dangerous failure mode is not high lag but a disconnected replica that the monitoring does not notice because there is no row in pg_stat_replication to alert on. The mitigation is to monitor for expected replicas being present, not just for connected replicas having low lag.
The way to do this is to maintain a list of expected replicas (by application_name) elsewhere—monitoring configuration, an internal table, the Kubernetes manifest, wherever—and alert when the set of connected application_names differs from the expected set. A replica that has been disconnected for hours produces zero rows in pg_stat_replication and zero alerts from naive lag monitoring.
The related failure mode is the replica that is connected but has fallen so far behind that the primary has discarded the WAL it needs. In this case the replica disconnects and cannot reconnect, and the only path forward is rebuilding it from a fresh base backup. The wal_keep_size setting (or replication slots) prevents this, but the setting only works if it is large enough for the worst-case lag the system can tolerate.
What pg_stat_replication does not show
The view shows the state of replication on the primary side: how far behind each connected replica is. It does not show why the replica is behind. The why usually requires querying pg_stat_activity on the replica itself to see whether the recovery process is blocked on disk I/O or CPU or waiting for a lock.
The view also does not show historical lag. The columns are point-in-time, so a replica that briefly fell behind and caught up will look fine in pg_stat_replication. For trend analysis, sample pg_stat_replication on a schedule and store the results in a table.
And the view does not show logical replication. Logical replication slots show up in pg_stat_replication only when an active subscriber is connected; the per-publication state lives in pg_stat_subscription on the subscriber side. For logical replication the diagnostic story is split across both sides of the connection.
Our use across products
Our four products (DocuMint, CronPing, FlagBit, WebhookVault) currently run on SQLite, where the closest analog is a sidecar replication setup with Litestream or similar, and the equivalent monitoring is checking that the WAL shipping process is healthy. The Postgres migration plan includes pg_stat_replication as part of the standard monitoring dashboard from day one, with both lag-threshold alerts and missing-replica alerts. Streaming replication is one of the operational areas where the cost of monitoring discipline shows up as the difference between catching problems before they matter and discovering them at failover.
Deeper observation
A pattern in Postgres operational tooling is that the views expose exactly enough information to catch problems before they become incidents, but only if you know how to read them. pg_stat_replication is a good example: the columns are all there, the relationships between them are documented, but the discipline of querying it on a schedule, comparing against an expected set, alerting on the right thresholds, and storing the history is mostly invented locally by every team. The features Postgres exposes are dense with operational meaning, but the operational meaning has to be unpacked by each operator who runs it.