Postgres EXPLAIN (BUFFERS): Finding the Queries That Trash Your Cache

Every Postgres engineer learns EXPLAIN ANALYZE eventually. Fewer learn to add BUFFERS. That's a mistake. The BUFFERS option is where you stop guessing and start seeing what your query actually cost the database's shared memory.

What BUFFERS Shows

Run a query with EXPLAIN (ANALYZE, BUFFERS) and you get four new numbers for each node in the plan: shared hit, shared read, shared dirtied, and shared written.

shared hit: blocks found in shared_buffers (no disk I/O). This is free. You want this number to be as high as possible.

shared read: blocks not found in shared_buffers and fetched from disk (or the OS page cache). This is expensive. Every shared read is either a disk seek or a page fault. A query with thousands of shared reads and hundreds of shared hits is telling you something is wrong.

shared dirtied: blocks modified by the query (turned dirty, not yet written). Expected for writes.

shared written: blocks written to disk during the query because the checkpointer couldn't keep up. This usually indicates checkpoint pressure, not a query problem directly — but it's worth noting.

Reading the Output

Here's a canonical BUFFERS example:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 12345;

Output fragment:

Index Scan using orders_customer_id_idx on orders
  (cost=0.43..8.45 rows=12 width=120)
  (actual time=0.082..0.118 rows=12 loops=1)
  Buffers: shared hit=16 read=2

Sixteen blocks from cache, two from disk. For 12 rows, that's reasonable. Now look at a sequential scan on the same table:

Seq Scan on orders
  (cost=0.00..45120.00 rows=1200000 width=120)
  (actual time=0.032..8421.543 rows=1200000 loops=1)
  Buffers: shared hit=1200 read=43920

Forty-three thousand disk reads. That query is pulling almost the entire table from disk every run. If you're seeing this for a table your application queries constantly, your shared_buffers is either too small or this table doesn't fit — and you need to know that.

Temp Reads and Writes

BUFFERS also shows temp I/O: temp read and temp written. These appear when a sort or hash operation spills to disk because it exceeded work_mem. A query showing temp blocks is a direct indication that work_mem is too small for the operation in question.

Sort  (cost=...)
  Buffers: shared hit=8 read=1200, temp read=4321 written=4321

Four thousand temp blocks read and written means this sort spilled to disk. Increase work_mem for this session or globally and run again — the temp blocks should disappear if the sort fits in memory.

The per-session setting is safe for experimentation:

SET work_mem = '256MB';
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Local Buffers

If your query touches temporary tables (not temp files — actual CREATE TEMP TABLE), you'll see local buffers instead of shared buffers. Local buffers are per-session and not shared across connections. They follow the same hit/read pattern but are entirely local to your session. High local reads mean your temp table operations are not fitting in temp_buffers (default 8MB, separate from work_mem).

WAL Records (Postgres 13+)

Postgres 13 added WAL accounting to BUFFERS output:

WAL: records=42 fpi=3 bytes=18432

records: WAL records generated. fpi: full-page images written (first modification after a checkpoint). bytes: total WAL bytes written. High FPI counts after a recent checkpoint are expected and normal. Unexpectedly high WAL generation from a routine SELECT (rare, but possible with MVCC cleanup) is worth investigating.

Combining With pg_stat_statements

pg_stat_statements accumulates buffer stats across all calls: shared_blks_hit, shared_blks_read, temp_blks_read, temp_blks_written. This lets you identify which queries in your workload collectively generate the most disk I/O, even if individual executions look cheap. A query that runs 10,000 times per minute with 5 shared reads each is generating 50,000 disk reads per minute from a single statement.

SELECT query, calls, shared_blks_hit, shared_blks_read,
       shared_blks_read::float / NULLIF(calls,0) AS reads_per_call
FROM pg_stat_statements
ORDER BY shared_blks_read DESC
LIMIT 20;

What BUFFERS Does Not Show

BUFFERS does not show you OS page cache behavior. A shared read that hits the OS page cache is fast; a shared read that goes to the physical disk is slow. The numbers look identical in BUFFERS output. If you need to distinguish them, use track_io_timing = on (adds actual I/O times) or system-level tools like pg_statio_user_tables and OS I/O statistics.

BUFFERS also doesn't show lock contention, connection overhead, or network time. For a complete picture of where query time goes, combine BUFFERS output with pg_stat_activity wait events and system-level CPU/IO metrics.

The Discipline

Make EXPLAIN (ANALYZE, BUFFERS) your default for query investigation. The extra two words cost nothing and reveal everything about memory efficiency. If you see shared reads dominating shared hits on a hot table, that's your next investigation target.

Looking for more infrastructure writing? Find it at builds.anethoth.com.