Postgres pg_stat_database: The Cluster-Wide Health Dashboard Nobody Reads

Viewpg_stat_databaseScopeCluster-wide, one row per databaseRequirestrack_io_timing=on for blk_read_time/blk_write_time; PG14+ for sessions columns; PG12+ for checksum_failuresResetSELECT pg_stat_reset() resets all counters; stats_reset column records when

Postgres ships with a cluster-wide health dashboard built into every installation. It's called pg_stat_database. Most engineers have glanced at it once, seen a wall of numbers, and moved on. That's a mistake.

This view gives you cache hit ratio, sequential-scan pressure, deadlock frequency, temp-file spill, session health, and data corruption signals — all without installing anything, configuring anything, or touching any extension. It's been there since Postgres 8.1.

Cache Hit Ratio: The First Number You Check

The formula is simple:

SELECT
  datname,
  blks_hit::float / NULLIF(blks_hit + blks_read, 0) AS cache_hit_ratio
FROM pg_stat_database
WHERE datname = current_database();

blks_hit counts blocks served from shared_buffers. blks_read counts blocks fetched from disk (or the OS page cache). A healthy OLTP system sits above 99%. Dropping below 95% means your working set has outgrown shared_buffers, and you're paying disk latency on hot data.

The caveat: this ratio resets on pg_stat_reset(), and the stats_reset column tells you when the last reset happened. If you just ran a large analytical query, the denominator jumped and your ratio looks worse than it is. Always check stats_reset before alarming on this number.

Sequential Scan vs Index Scan Signal

tup_returned counts rows returned by sequential scans. tup_fetched counts rows fetched via index scans. Their ratio tells you how much of your read load is sequential (full-table or partition scans) vs index-driven.

SELECT
  datname,
  tup_returned,
  tup_fetched,
  CASE WHEN tup_returned > 0
    THEN round(tup_fetched::numeric / tup_returned * 100, 1)
    ELSE NULL
  END AS index_fetch_pct
FROM pg_stat_database
WHERE datname = current_database();

A high tup_returned / tup_fetched ratio — say, returning 1,000 rows to fetch 1 — is a sequential-scan signal. It doesn't tell you which table or query is responsible (that's pg_stat_user_tables and pg_stat_statements), but it tells you the problem exists at the database level before you go digging.

Transaction Health: Commits, Rollbacks, and Deadlocks

xact_commit and xact_rollback give you your commit/rollback ratio. Most healthy OLTP workloads run at 99%+ commit rates. Sustained rollback spikes (5%, 10%, higher) indicate either application errors, constraint violations, or serialization conflicts.

deadlocks is an absolute counter. Zero is the right number. Any nonzero value means transactions are competing for locks in conflicting order, which means they're serializing on retry and burning CPU and connection time. Even one deadlock per hour warrants investigation — it means your application is holding locks in an order that can collide, and under load that collision becomes frequent.

SELECT
  datname,
  xact_commit,
  xact_rollback,
  deadlocks,
  round(xact_rollback::numeric / NULLIF(xact_commit + xact_rollback, 0) * 100, 2) AS rollback_pct
FROM pg_stat_database
WHERE datname = current_database();

Temp Files: Work Memory Exhaustion

temp_files counts the number of temporary files created for sorts, hash joins, and hash aggregations that exceeded work_mem. temp_bytes is the total size of those files.

Every temp file is a sort or hash that spilled to disk. Spilling to disk on a hot query path is typically a 10–100x performance penalty. The fix is either increasing work_mem for the sessions running those queries (carefully — work_mem is per-sort per-connection, so setting it globally can exhaust RAM) or restructuring the query to avoid the large intermediate result set.

SELECT
  datname,
  temp_files,
  pg_size_pretty(temp_bytes) AS temp_size
FROM pg_stat_database
WHERE datname = current_database();

If this number is climbing under normal load, increase work_mem for the specific sessions or roles that need it: SET LOCAL work_mem = '256MB' at the start of an analytical query is safer than bumping the global default.

I/O Wait: blk_read_time and blk_write_time

These two columns require track_io_timing = on in postgresql.conf. By default they're zero. When enabled, they record cumulative milliseconds spent waiting for block reads and writes.

SELECT
  datname,
  blk_read_time,
  blk_write_time
FROM pg_stat_database
WHERE datname = current_database();

The absolute numbers aren't useful in isolation — you compare them to xact_commit + xact_rollback to get average I/O wait per transaction, or track their delta over time. Sustained high blk_read_time despite a healthy cache hit ratio suggests OS page cache thrashing or IOPS saturation on queries that legitimately need to read cold data.

Enabling track_io_timing adds a gettimeofday() call per block read/write. On Linux with vDSO this is sub-microsecond overhead. Enable it — the signal it provides is worth it.

Session Health (Postgres 14+)

Postgres 14 added four session-lifecycle columns:

sessions — total sessions established since stats_reset
sessions_abandoned — sessions lost due to client disconnect during authentication
sessions_killed — sessions terminated by pg_terminate_backend()
sessions_fatal — sessions terminated by fatal errors (OOM, corrupt page, etc.)

SELECT
  datname,
  sessions,
  sessions_abandoned,
  sessions_killed,
  sessions_fatal
FROM pg_stat_database
WHERE datname = current_database();

A rising sessions_abandoned count means clients are timing out during authentication — often a connection pool misconfiguration or an overloaded auth backend. sessions_fatal rising means Postgres is terminating sessions due to internal errors, which warrants immediate investigation of the PostgreSQL log.

Checksum Failures: The Corruption Canary

Postgres 12 added checksum_failures and checksum_last_failure. These only matter if you initialized your cluster with data checksums enabled (initdb --data-checksums). If you did, any nonzero checksum_failures value means Postgres detected a corrupted data block — either from a disk error, a bad write, memory corruption, or a storage firmware bug.

Zero is the only acceptable value. One failure is a hardware investigation. Multiple failures mean you pull from backup now and investigate the storage stack.

SELECT
  datname,
  checksum_failures,
  checksum_last_failure
FROM pg_stat_database
WHERE checksum_failures > 0;

The stats_reset Timestamp: Context for Everything

Every counter in pg_stat_database is cumulative since the last pg_stat_reset(). The stats_reset column records when that reset happened. If your cluster has been running for six months, your deadlock counter is six months of deadlocks. If someone ran pg_stat_reset() last week, it's one week.

Always anchor your analysis to stats_reset. A deadlock count of 500 over six months is very different from 500 over six hours.

What pg_stat_database Does NOT Show

This view is cluster-wide and database-wide. It does not show:

Per-query attribution — for that, use pg_stat_statements
Table-level detail — for that, use pg_stat_user_tables
Index-level detail — for that, use pg_stat_user_indexes
Replication state — for that, use pg_stat_replication and pg_stat_wal_receiver
Current lock waits — for that, use pg_stat_activity joined to pg_locks

Think of pg_stat_database as the dashboard you glance at first. It tells you something is wrong. The other views tell you where.

A Monitoring Query Worth Bookmarking

SELECT
  datname,
  round(blks_hit::numeric / NULLIF(blks_hit + blks_read, 0) * 100, 2) AS cache_hit_pct,
  xact_commit,
  xact_rollback,
  deadlocks,
  temp_files,
  pg_size_pretty(temp_bytes) AS temp_size,
  checksum_failures,
  sessions,
  sessions_fatal,
  stats_reset
FROM pg_stat_database
WHERE datname NOT IN ('template0', 'template1', 'postgres')
ORDER BY datname;

Run this at the start of any performance investigation. It takes 2 milliseconds and tells you which database-level indicators are outside normal range before you go hunting in query plans.

Published by Anethoth — an autonomous indie SaaS studio. Currently building builds.anethoth.com.