Postgres Sequential Scans vs Index Scans: When the Planner Picks Right and When It Doesn't
The intuition that sequential scans are slow and index scans are fast is correct most of the time and wrong in the cases that produce the most surprising query plans. Understanding when the planner picks sequential scan over an existing index is the difference between fighting the planner and giv...
The reflexive reaction to seeing a sequential scan in EXPLAIN output is to add an index. The reflex is usually correct, but the cases where it is wrong are the cases that produce the most confusing query plans, and the engineer who understands when sequential scan is the right choice is the engineer who can read the planner's reasoning rather than fighting it.
The cost model
The Postgres planner picks the cheapest plan according to its cost model, where cost is roughly an estimate of disk-page reads plus tuple-processing time. Sequential scans read all pages of a table in storage order; index scans read the index, find pointers to specific rows, and fetch those rows from the heap, which may require following the pointer to a random heap page per row.
The cost asymmetry matters: sequential reads are typically 4-10x faster than random reads on modern storage even on NVMe where the absolute random-read latency is microseconds. The seq_page_cost default of 1.0 and random_page_cost default of 4.0 reflect this. When the index scan would fetch a large fraction of the rows in the table, the random-read cost of fetching them through the index exceeds the sequential-read cost of just reading the whole table.
The crossover point depends on row size, index correlation, and selectivity. For uncorrelated rows on spinning disks, the crossover was around 5-10% of the table; on NVMe with the random_page_cost adjusted to 1.1 or 1.5, the crossover moves to 15-30%. Sequential scan is the right answer for queries that touch a significant fraction of the table, and the planner picks it because the cost model gets the comparison right.
When the planner picks sequential scan unexpectedly
The cases where sequential scan appears in EXPLAIN output despite the existence of a seemingly-applicable index fall into a small set of patterns.
First: the index is on a column with low cardinality and the query predicate selects too many rows. An index on a boolean column or a status column with five possible values rarely helps because matching rows are spread across most of the table. The planner correctly recognizes that following the index pointers will cost more than sequential reading.
Second: the table is small. Below a few hundred pages, sequential scan is faster than any index strategy because the index has overhead and the table fits comfortably in the cache. The planner picks sequential scan for small tables and then switches to index scans as the table grows. This is correct behavior that often surprises engineers debugging queries on test data with thousands of rows where the production queries on millions of rows will use the index.
Third: the query predicate is not in a form the index can use. WHERE lower(email) = '[email protected]' cannot use a plain index on email; it needs a functional index on lower(email). Similarly, WHERE email LIKE '%example.com' cannot use a B-tree index because the leading wildcard makes the search unbounded; it needs a trigram index. WHERE created_at + interval '1 day' > now() cannot use an index on created_at because the planner cannot prove the expression is monotonic; rewriting to WHERE created_at > now() - interval '1 day' lets the index work.
Fourth: the statistics are stale or wrong. The planner estimates how many rows the predicate will match based on the pg_stats histogram and most-common-values lists. If statistics are stale (the table has changed substantially since the last ANALYZE) or the distribution is non-uniform in a way the histogram cannot capture (a highly skewed column, correlations between columns), the planner can estimate one row when the actual count is a million or vice versa. The fix is ANALYZE, possibly with increased default_statistics_target for the column, or extended statistics on column groups.
When the planner picks index scan unexpectedly
The opposite case is also informative. The planner sometimes uses an index when sequential scan would be faster.
The most common reason is a LIMIT clause. SELECT * FROM big_table ORDER BY created_at DESC LIMIT 10 with an index on created_at can return ten rows after reading ten index entries, which is fast regardless of table size. The planner picks the index because the LIMIT bounds the work.
Another reason is index-only scans. If the query references only columns covered by the index and the visibility map indicates that most of the relevant index entries point to all-visible heap pages, Postgres can answer the query from the index alone without heap fetches. This makes the index scan dramatically cheaper than sequential scan because there is no random-read overhead.
A third reason is good correlation between the index order and physical heap order. If the table is naturally sorted by the index column (because rows are inserted in order and never updated), index scans behave like sequential scans of the heap because consecutive index entries point to nearby pages. The pg_stats.correlation column reports this; values close to 1.0 indicate the heap is roughly sorted by that column, and the planner accounts for this in cost estimation.
Reading EXPLAIN ANALYZE
The diagnostic discipline is to look at the estimate-vs-actual divergence in EXPLAIN ANALYZE. The first number on each plan node is the planner's estimate; the actual rows are reported after the arrow. If estimate and actual are within a factor of 2-3, the cost-based decision is grounded in reality and is probably right. If estimate and actual diverge by 10x or 100x, the planner is working from bad information and the decision may be wrong even if the alternative looks worse on paper.
The next thing to check is the Rows Removed by Filter counter on each scan node. A sequential scan that reads ten million rows and removes 9,990,000 of them by filter is doing 99% wasted work; an index scan on a more selective column would help. A sequential scan that reads ten million rows and keeps eight million is doing the right thing; trying to force an index scan there would be slower.
The temp-blocks counters indicate sort or hash operations spilling to disk. Sequential scans by themselves do not spill, but downstream sorts and hash joins do, and that can affect whether sequential scan plus sort is faster than index scan that returns rows in the right order.
When to override the planner
The standard advice is to give the planner better information rather than fight it: run ANALYZE, increase statistics targets, add functional indexes for non-trivial predicates, rewrite queries to use sargable predicates. These approaches make the planner's decisions correct rather than overriding them.
The escape hatch is SET LOCAL enable_seqscan = off; for the specific transaction. This is a debugging tool, not a production fix: forcing index scans where the planner chose sequential scan is usually slower in practice, and even when it is faster for the immediate query, the configuration drift produces surprises elsewhere. The reliable pattern is to use the override briefly to confirm that the planner has made a wrong choice, then find a non-override fix.
The pg_hint_plan extension lets you add hints to specific queries via comments, which is the lighter-weight alternative to global session settings for cases where you genuinely cannot fix the underlying issue.
Across our four products
We run DocuMint, CronPing, FlagBit, and WebhookVault on SQLite, whose query planner is simpler than Postgres but uses recognizably the same kind of cost-based plan selection. The EXPLAIN QUERY PLAN command is the SQLite equivalent of EXPLAIN, and the ANALYZE command (run periodically) keeps the statistics current. The planning surface is smaller but the disciplines transfer: give the optimizer better information rather than fight it, look at the actual row counts to find diverging estimates, write predicates the optimizer can use.
The deeper observation is that the planner's choice between sequential and index scans is one of the parts of the database where the cost model is most exposed to operator inspection. Reading EXPLAIN is the difference between treating the planner as a black box that sometimes makes wrong choices and treating it as a known-quantity optimizer that responds predictably to statistics quality, predicate form, and index design. The investment in reading EXPLAIN compounds: every query a team can quickly diagnose is a query they will not have to escalate, and the patterns recur enough that the literacy transfers across schemas.