Postgres pg_buffercache: Understanding What Actually Lives in shared_buffers

The shared_buffers configuration parameter is one of the most-discussed and least-examined Postgres knobs. The 25-percent-of-RAM rule of thumb is correct often enough that most teams leave it there forever, but the rule answers the wrong question. The question is not how much memory to allocate; it is whether the memory you allocated is doing the work you assumed it would.

The pg_buffercache extension is the introspection surface for shared_buffers. It exposes a view that maps every 8KB buffer frame in the cache to the database, table, fork, and block it currently holds, along with usage count and pinning state. Querying this view answers the questions that the bare configuration parameter cannot: which tables are dominating the cache, what cache hit ratio each table is actually getting, whether shared_buffers is too small for the working set or too large for the workload, and which queries are creating cache pressure on which tables.

What pg_buffercache exposes

The extension is built into Postgres but must be loaded with CREATE EXTENSION pg_buffercache. It adds a view named pg_buffercache with one row per buffer in shared_buffers. The interesting columns: bufferid is the buffer-frame index, relfilenode is the on-disk file identifier of the relation occupying this buffer (joinable to pg_class.relfilenode), reldatabase is the database OID, relforknumber distinguishes the main fork from FSM, VM, and init forks, relblocknumber is the block within the file, isdirty indicates a modified-but-not-yet-flushed buffer, usagecount is the clock-sweep counter (0-5) that drives eviction, and pinning_backends is the count of backends currently pinning this buffer.

The view scans every buffer in shared_buffers under a brief lightweight lock. The cost is proportional to the cache size; for an 8GB shared_buffers (one million 8KB buffers), the scan takes a few hundred milliseconds and briefly holds locks that can block other operations. Run pg_buffercache queries during normal operation but be aware that very frequent sampling is not free. Most diagnostic patterns sample once per minute or on-demand during incidents, which is light enough that the impact is invisible.

The per-table cache-share query

The first useful pg_buffercache query: which tables are taking the most cache space? SELECT c.relname, count(*) * 8 / 1024 AS mb_cached FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) WHERE b.reldatabase IN (0, (SELECT oid FROM pg_database WHERE datname = current_database())) GROUP BY c.relname ORDER BY mb_cached DESC LIMIT 20.

The output is usually surprising. Teams that have not looked before find that one or two tables are taking 60 percent of the cache, often because of a frequent full-table scan or because the table is genuinely large and frequently accessed. The smaller tables that handle most of the request volume often take much less cache than expected because their working set is small. The pg_attribute and pg_class system catalogs almost always show up in the top 20 because Postgres consults them on nearly every query.

The diagnostic value of this query is comparing what dominates the cache against what the application would benefit from caching. If the cache is dominated by a table that is read once per day in batch jobs while a 100x-per-second table is constantly evicted from cache, the workload and the configuration are misaligned. The fix is usually not to grow shared_buffers; it is to fix the query that is scanning the batch table during peak request load.

The cache-hit-ratio-per-table calculation

The standard cache-hit-ratio metric from pg_stat_user_tables aggregates across the lifetime of the statistics counters. The pg_buffercache view shows the current cache state, and combining the two gives a finer view: which tables have most of their working set in cache, and which are constantly being read from disk despite being heavily used.

The pattern: SELECT relname, heap_blks_read, heap_blks_hit, round(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) AS hit_ratio FROM pg_statio_user_tables ORDER BY heap_blks_read DESC LIMIT 20. The hit_ratio column is what most teams already track, but the heap_blks_read column is more diagnostic; a table with a 99 percent hit ratio but a high absolute heap_blks_read count is still doing a significant amount of disk IO in absolute terms.

The complementary pg_buffercache query shows what fraction of the table is currently cached: SELECT c.relname, pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size, pg_size_pretty(count(*) * 8192) AS cached, round(100.0 * count(*) * 8192 / pg_total_relation_size(c.oid), 2) AS pct_cached FROM pg_buffercache b JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid) WHERE b.relfilenode IS NOT NULL GROUP BY c.oid, c.relname ORDER BY count(*) DESC LIMIT 20.

The pct_cached column is the load-bearing diagnostic. A table that is 100 percent cached has its entire data in memory; reads are pure RAM. A table that is 10 percent cached has 10 percent in memory and 90 percent on disk; whether reads hit the cache depends on which 10 percent and which queries. A table that is much larger than shared_buffers and shows usage activity is constantly competing with other tables for cache space, and any query that scans it produces cache pressure on the rest of the workload.

The usagecount distribution

The usagecount column drives Postgres's clock-sweep eviction. Buffers start at usagecount 1 and increment to a maximum of 5 each time they are accessed; the background eviction sweep decrements counts and reclaims buffers that reach 0. The distribution of usagecount across the cache tells you how much of the cache is hot versus cold.

A healthy cache has most buffers at usagecount 1 or 2 (recently allocated but not heavily reused), some at 3-5 (genuinely hot), and few at 0 (about to be evicted). A cache where most buffers are at 0 indicates eviction pressure: more new data is coming in than the existing data is being reused. A cache where most buffers are at 5 indicates that the working set fits comfortably and the cache is being maintained without pressure; in this case, growing shared_buffers gives no benefit because there is already idle capacity.

The query: SELECT usagecount, count(*) AS buffers, pg_size_pretty(count(*) * 8192) AS size FROM pg_buffercache GROUP BY usagecount ORDER BY usagecount. The output is one row per usagecount level; reading the distribution tells you whether the cache is full of hot data, cold data, or under pressure.

The reverse-lookup pattern

The third common pg_buffercache use case is reverse-lookup: given a query that seems slow, what does its cache state look like? The pattern is to run the query, then immediately query pg_buffercache for the tables it touched. If a query is doing many physical reads, the touched tables are probably not well cached; if the query is doing many logical reads but few physical reads, the data is in cache and the slowness is somewhere else.

The diagnostic decision matrix: high logical reads plus low physical reads plus slow runtime means CPU or lock contention (look at pg_stat_activity wait events); high logical reads plus high physical reads plus slow runtime means cache miss (look at pg_buffercache distribution); low logical reads plus low physical reads plus slow runtime means waiting on something other than the database (look at network or application code).

What pg_buffercache does not show

The view is per-database in the column listing, but covers all databases when scanned by a superuser. To restrict to the current database, join on reldatabase. The view does not show the OS page cache, which is below shared_buffers in the IO hierarchy and is often larger. A table that is 0 percent cached in shared_buffers may still be entirely in OS page cache, in which case reads are still fast even though Postgres reports them as cache misses; this is the rationale behind setting shared_buffers to a smaller fraction of RAM than would naively maximize cache. The hot path through OS cache is genuinely fast and is the reason that on a server with abundant RAM the 25-percent rule continues to work well.

The view also does not show write cache state in a useful way. The isdirty column indicates buffers that have been modified but not yet flushed, but the rate of dirty-buffer production and flushing is better tracked through pg_stat_bgwriter, which exposes counters for checkpointer and background writer activity.

The configuration question

The conventional wisdom on shared_buffers sizing is 25 percent of system RAM, capped at roughly 8GB on systems with more than 32GB. The cap exists because Postgres maintains the cache via clock-sweep eviction with explicit pinning, which scales sublinearly with cache size; doubling shared_buffers from 8GB to 16GB does not double the cache effectiveness, and at very large sizes the maintenance overhead starts to dominate.

The pg_buffercache view is how you validate whether the conventional 25 percent is right for your workload. If pg_buffercache shows the cache is dominated by tables you expect (working set well-sized for cache) and usagecount distribution is heavy at the high end (cache is comfortable, not under pressure), the configuration is appropriate and growing shared_buffers will not help. If the cache is dominated by tables you did not expect or usagecount distribution is heavy at 0-1 (constant eviction pressure), the workload-versus-cache mismatch is the actual problem and growing shared_buffers is only one possible fix; the other fixes are query optimization, index changes, and partitioning to reduce scan size.

Across our four products

We run DocuMint, CronPing, FlagBit, and WebhookVault on SQLite, which has no comparable cache-introspection view. SQLite uses the page cache configured by PRAGMA cache_size and the underlying OS page cache for files; both are opaque from the application side. The pg_buffercache equivalent does not exist for SQLite, and we accept the opacity as part of the operational simplicity trade-off. When we migrate any product to Postgres in the future, the cache-state visibility this extension provides will be one of the qualitative gains worth the migration cost.

The deeper observation is that database tuning advice is often phrased as configuration values, but the right configuration depends on what the workload actually does. The conventional rules of thumb are correct for typical workloads, and pg_buffercache is the diagnostic surface that confirms whether your workload is typical or unusual in ways that the rules of thumb do not capture. Postgres is unusual among databases in giving operators this much visibility into its own internals, and the operators who develop the habit of looking are the ones who configure databases that consistently outperform default-configured peers.

Read more