engineering

Reading EXPLAIN ANALYZE BUFFERS: The Memory and IO Story Behind Query Plans

EXPLAIN ANALYZE tells you how long each operator took. EXPLAIN ANALYZE BUFFERS tells you why. The buffer accounting separates cache hits from disk reads and turns most performance mysteries into questions about cache fit.

Anethoth

17 May 2026 — 5 min read

The default EXPLAIN ANALYZE output gives times and row counts. That is enough to identify which operator is slow and roughly how much work it is doing. It is not enough to explain why a query that ran in 20 milliseconds yesterday is taking 2 seconds today. The same plan, the same row counts, the same operators, but an order of magnitude difference in runtime. The missing information is what the data did before it reached the executor: was it already in the shared buffer cache, or did Postgres have to read it from disk?

The BUFFERS option exposes the buffer accounting layer underneath the executor. Adding it to EXPLAIN ANALYZE (BUFFERS) annotates each operator with the number of shared blocks hit (already in cache), read (fetched from disk or OS page cache), dirtied (modified by this query), and written (flushed to disk during the query). For most performance investigations, the hit-vs-read split is the load-bearing distinction.

What the numbers mean

A shared buffer hit is a 8KB page that was found already resident in Postgres's shared_buffers memory. The cost is nominal: a hash lookup and a memcpy. A shared buffer read is a 8KB page that Postgres asked the operating system for. The OS may serve it from its own page cache (fast, microseconds) or from disk (slow, milliseconds on SSD, tens of milliseconds on spinning disks). Postgres cannot distinguish these two cases in the BUFFERS output.

The implication for diagnosis: if BUFFERS shows mostly hits, the query is bottlenecked on CPU and the executor. If BUFFERS shows mostly reads, the query is bottlenecked on memory subsystem and IO. The interventions are different: a hits-bottlenecked query gets faster from better indexes or query rewriting, a reads-bottlenecked query gets faster from more RAM or warmer caches or different access patterns that reduce the working set.

The temp blocks counters cover spills to temporary disk files: sorts and hash joins that exceeded work_mem. Any non-zero number in these columns is a flag. A sort that spills to disk runs roughly 10-100x slower than one that fits in memory. The fix is either increasing work_mem (for the affected session) or reducing the data volume the sort sees by filtering earlier.

The canonical diagnosis pattern

Run EXPLAIN (ANALYZE, BUFFERS) ... on the slow query. Look at the top-level operator first to see total buffer activity. If reads are zero and hits are reasonable, the query is doing CPU-bound work and the executor itself is the slow part. If reads are substantial, the query is fetching pages that were not in shared_buffers.

Then drill into the operator tree. For each operator with significant time spent, examine its buffer numbers. The operator that accounts for the most reads is the one that benefits most from being warm. If the largest read consumer is a sequential scan, the table is being walked from disk because the relevant pages are not cached. If it is an index scan, the index might fit in cache but the heap pages it lookups against do not.

A useful trick: run the query twice in a row. The first run measures cold cache behavior. The second run measures warm cache behavior. The difference is the cost of cache misses. If the second run is much faster, the query is sensitive to cache state and the production hit-or-miss depends on what else is competing for buffers. If the second run is not faster, the query is doing genuine work that does not benefit from caching, and the optimization needs to reduce work, not improve cache fit.

The shared_buffers sizing question

The BUFFERS output gives empirical evidence about cache fit. If queries routinely show many reads, the working set is exceeding shared_buffers. The traditional rule of 25% of system RAM for shared_buffers is a starting point; the actual right size depends on workload. For mostly-read workloads with hot data, larger shared_buffers reduces reads. For write-heavy workloads, larger shared_buffers increases checkpoint pressure and can hurt throughput.

The pg_buffercache extension complements BUFFERS by showing what is actually resident. Querying pg_buffercache reports which relations and which pages are in shared_buffers right now. This is the static view; BUFFERS gives the dynamic per-query view. Both together build a picture of what the cache is doing.

The OS page cache also matters and is harder to observe. A buffer read in BUFFERS counts the same way regardless of whether the page came from RAM (OS cache) or disk. On a system with substantial free RAM, the OS cache absorbs much of what shared_buffers misses, and the actual disk traffic is much lower than BUFFERS reads suggest. The iostat output and pg_stat_io view (Postgres 16+) provide the OS-level view.

The IO pattern question

BUFFERS counts logical IO at the page level. It does not distinguish sequential from random access patterns. A sequential scan that reads 1000 pages in physical order is far cheaper than an index scan that reads 1000 pages scattered across the table. Both show as 1000 reads in BUFFERS. The runtime difference can be 10x or more on spinning disks and noticeable even on NVMe.

The diagnostic complement to BUFFERS is plan shape. Sequential scans with high read counts are usually fine if the alternative would be more random IO from an index scan. Index scans with high read counts on the heap (after a Bitmap Heap Scan or Index Scan node) are flagged by the rows_removed_by_filter counter: if the index returns 100K candidate rows but only 1K match the filter, the heap reads are 99% wasted. The fix is usually an index on the filtered column or a composite index that satisfies more of the WHERE clause.

Track lookup vs heap fetch

For index-only scans, BUFFERS distinguishes between reads of the index pages themselves and reads of the heap (for visibility map misses). An index-only scan that should never touch the heap but shows substantial heap reads indicates a problem with the visibility map: the table has not been vacuumed recently, the visibility map is stale, and the supposedly index-only scan is doing heap fetches to check tuple visibility. The fix is VACUUM on the affected table. The diagnostic is the BUFFERS output showing reads on heap pages during an Index Only Scan node.

The auto_explain integration

Setting auto_explain.log_min_duration with auto_explain.log_buffers = on in production captures BUFFERS data for slow queries automatically. The first time a query becomes slow, the log captures not just the plan but the buffer state, which is critical for diagnosing intermittent issues that depend on cache state.

The cost is log volume: BUFFERS output adds 50-100 bytes per operator to each logged plan. The volume is manageable for queries above 1 second of duration and worth the cost. For very high-volume databases, auto_explain.sample_rate reduces the load to a manageable level while preserving diagnostic value.

What this does and does not catch

BUFFERS catches all the cache-related performance issues that are usually the hardest to diagnose from plan shape alone. It does not catch issues outside the executor: connection pool contention, lock waits, replication lag, WAL flushing, autovacuum competition. Those require their own diagnostics. But for the class of issues where the same plan produces different runtimes on different occasions, BUFFERS is usually the first place to look.

The mental model is that the executor and the storage layer are separate concerns, and most performance work focuses on the executor. BUFFERS exposes the storage layer in enough detail to attribute time to the underlying cause. Once attributed, the remediations are well-known: bigger shared_buffers for read-bottlenecked queries, better indexes for waste-removing queries, smaller working sets for hot paths, and proper vacuum discipline for index-only scans.

Across our four products

We run DocuMint, CronPing, FlagBit, and WebhookVault on SQLite, which has a simpler storage model and a different diagnostic toolkit. The SQLite equivalent is EXPLAIN QUERY PLAN and sqlite3_status() counters for cache statistics; the executor is single-threaded and the cache is process-local, so the analysis is simpler but the principles transfer. When we eventually graduate to Postgres for any of these products, the BUFFERS-driven workflow is what will replace the per-query timing we currently use.

The deeper observation is that database performance is mostly a story about which data is in memory and how it got there. The query plan is the surface; the cache state is the substrate. Once you can read both, the queries that get slow under load make sense as predictable consequences of cache pressure rather than mysterious regressions. The instrument for reading the substrate is BUFFERS, and most Postgres production teams underuse it.

Reading EXPLAIN ANALYZE BUFFERS: The Memory and IO Story Behind Query Plans

Anethoth

What the numbers mean

The canonical diagnosis pattern

The shared_buffers sizing question

The IO pattern question

Track lookup vs heap fetch

The auto_explain integration

What this does and does not catch

Across our four products

Read more

How Treehoppers Communicate Through Plant Stems: The Strange Substrate-Borne Vibrational Network

The Forgotten History of the Microwave Oven: How Radar Engineering Reshaped the Kitchen

Postgres pg_settings: Reading and Reasoning About Configuration at Runtime

Designing API Webhook Payloads: Snapshots vs References and the Right Default for B2B SaaS