Postgres random_page_cost: The Planner Knob That Shapes Index Decisions
random_page_cost is the single most important storage-cost parameter the planner uses. Its 4.0 default reflects spinning rust. On SSD and NVMe storage, the right value is between 1.0 and 1.5, and changing it shifts plan choices for thousands of queries at once.
The Postgres planner picks query plans by estimating their cost in arbitrary cost units. The unit is calibrated so that a sequential page read costs 1.0. Every other operation is priced relative to that baseline. The most consequential of those relative prices is random_page_cost, the cost of a single random page read. Its default value of 4.0 means the planner believes random I/O is four times more expensive than sequential I/O. On spinning disks, that ratio was approximately correct. On modern SSD and NVMe storage, it is wrong by a factor of three or four.
Why the ratio matters
Almost every interesting plan-choice decision the planner makes depends on this ratio. An index scan reads pages in a pseudo-random order determined by the index structure. A sequential scan reads pages in physical order. When deciding between them, the planner multiplies estimated page counts by the appropriate per-page cost and picks the lower total. If random_page_cost is too high, the planner thinks indexes are more expensive than they actually are and falls back to sequential scans on tables where indexes would be faster. If random_page_cost is too low, the planner thinks indexes are cheaper than they actually are and picks them in cases where a sequential scan would do less work.
The ratio also shapes join algorithm choice. Nested loop joins with an inner index lookup are priced as repeated random reads. Hash joins build a hash table and probe it sequentially. Merge joins read both inputs in sort order. The relative cost of the three depends on random_page_cost, and changing it shifts the crossover points.
The default and its history
The 4.0 default dates to the 1990s. The Postgres source has the constant documented as approximating the relative cost of random versus sequential I/O on 7200 RPM consumer disks. On that hardware, a sequential read could pipeline through the OS readahead and prefetch the next several blocks at no additional cost. A random read required a head seek averaging 8 to 10 milliseconds before the actual transfer. The 4 to 1 ratio was an approximation that worked across a wide range of mid-1990s storage.
SSD changed the math. A modern consumer SSD can do random 4K reads at queue depth 1 in roughly 100 microseconds, compared to roughly 60 microseconds for a sequential read at the same queue depth. The ratio is less than 2 to 1, and at higher queue depths it converges to nearly 1 to 1. NVMe SSDs collapse the gap further, with random and sequential read latencies often within 10 percent of each other.
The Postgres default has not been updated to reflect this because changing it would change plan choices on every existing database where the default is in effect, which would surprise enough users that the stability cost is judged higher than the correctness cost. The right value for any specific database is a configuration question, not a default question.
What value to pick
The honest answer depends on storage and on the ratio of cached to uncached reads. The most common production values are 1.1 for NVMe storage, 1.2 to 1.5 for SATA SSD, and the default 4.0 for spinning rust. Many databases run with effective_cache_size set high enough that most reads are served from OS cache or shared_buffers, which means the actual cost per "random" read is the cost of a memory copy rather than a disk operation. In that regime, values near 1.0 are appropriate, and some teams set it as low as 1.05 to keep the cost slightly above sequential for tie-breaking.
The conservative migration is to halve the default. From 4.0 to 2.0 produces noticeable plan changes without dramatic reversals, and lets the team observe whether the changes are improvements before going further. The aggressive migration straight to 1.1 produces larger changes and shorter feedback loops.
The interaction with effective_cache_size
random_page_cost does not act alone. The planner also uses effective_cache_size to estimate how much of a table is likely to be in cache. A high effective_cache_size combined with a low random_page_cost produces plans that favor index scans aggressively. A low effective_cache_size combined with a high random_page_cost produces plans that favor sequential scans aggressively. The two parameters should be tuned together.
The 4 GB effective_cache_size default is too low for modern hardware. A reasonable starting value is 50 to 75 percent of available RAM. The asymmetry of the error favors the lower end: too-low effective_cache_size produces some sequential-scan bias that is recoverable, while too-high effective_cache_size produces plans that assume more caching than the hardware can support and run slowly under cache pressure.
Verifying the change
The change is reversible and observable. After setting random_page_cost lower, the diagnostic is to compare EXPLAIN ANALYZE output for known-slow queries before and after. Plans that switch from sequential scan to index scan should run faster on actual hardware. Plans that already used indexes should be unaffected. Plans that switch from index scan to sequential scan are rare but possible if the ratio change crosses a threshold for that specific query.
The auto_explain extension captures this kind of plan drift automatically. Setting auto_explain.log_min_duration to 1000 ms and auto_explain.log_format to json captures every slow plan, which makes the before-and-after comparison concrete. Without auto_explain, the diagnostic has to be driven by hand against a known workload.
What the parameter does not solve
random_page_cost is a planner hint, not a runtime tuning. It does not make individual queries faster or slower. It only changes which plan the planner picks. If the plan it picks is correct, lowering random_page_cost has no effect on that query. If the plan it picks is wrong, lowering random_page_cost may shift the plan to one that is correct, but it may also shift other plans in directions the team did not intend.
The parameter does not handle column correlation. Two columns that are correlated in the data (created_at and id, for example) require extended statistics with CREATE STATISTICS to inform the planner. random_page_cost does not change the planner's cost estimate for a misleading row estimate; it changes the per-page cost the planner multiplies by an already-wrong row estimate.
The parameter does not help with queries that hit pathological plans because of stale ANALYZE statistics. The fix for those is to run ANALYZE or to raise default_statistics_target on the affected columns. random_page_cost is downstream of statistics.
The deeper observation
random_page_cost is one of the small set of Postgres parameters where the right value depends on hardware in ways the default cannot capture. The 4.0 default was correct for the era it was written in. It is not correct for modern storage. The reason it has not been changed is that the cost of changing defaults that affect plan choice for every existing user is judged higher than the cost of leaving teams with a value that requires per-deployment thought. That is probably the right call for the project, but it means the parameter is a thing every Postgres team has to confront separately, and most teams confront it after a slow-query incident rather than as part of initial configuration.
Read more essays and technical writing at anethoth.com — a notebook on databases, distributed systems, biology, and the engineering that holds the world together.