Vol. IV · No. 04 Monday · 29 June 2026
Now writing — Why Your Index Scan Is Slower Than a Sequential Scan: When the Planner Is Right to Ignore Your Index dispatches · 3 streams
← All dispatches
engineering Dispatch 3 min read · 5 Jun 2026

Postgres BRIN Indexes: When Range-Based Indexing Beats B-tree

Most teams reach for a B-tree index by default. On a 100-million-row timestamp table, BRIN does the same job in 100KB instead of 2GB — if your data's physical order matches your query range.

engineering · Curiosity

The first time you see a B-tree index on a timestamp column balloon past 2GB on a table that only has 100 million rows, you start asking better questions. BRIN is one of the answers — but only if your data behaves the way it needs to.

What BRIN Actually Stores

A Block Range INdex doesn't store individual values. It stores a summary — the minimum and maximum value — for each range of consecutive 8KB pages. The default is 128 pages per range, which works out to roughly 1MB of table per BRIN entry.

That's the whole trick. Instead of one index entry per row, you get one entry per megabyte of table. A B-tree on a 100-million-row timestamp column typically grows to around 2GB. The equivalent BRIN index: about 100KB.

When BRIN Wins

BRIN works because of a correlation assumption: the physical order of rows in the heap must match the logical order of the indexed column. If you insert rows in timestamp order — as almost all event logs, audit trails, and append-only tables do — then physical block 1 contains old records and physical block 10,000 contains new ones. The min/max summary per block range is tight and accurate.

When queries ask for rows in a date range, the planner can skip entire block ranges that fall entirely outside the range. On a 100-million-row table with decent correlation, BRIN can reduce physical I/O by 90% or more compared to a sequential scan — at nearly zero index overhead.

BRIN excels on:

  • Append-only tables ordered by timestamp (event logs, sensor readings, audit tables)
  • Auto-increment primary keys with range queries (since ID sequence matches insert order)
  • Partitioned tables — BRIN per partition compounds the efficiency since each partition is already a physical range

Tuning pages_per_range

The default of 128 pages per range (1MB of table per summary entry) is a reasonable starting point. Smaller values give you more precise range exclusions but a larger index. Larger values give you a tinier index but coarser exclusions.

CREATE INDEX idx_events_created_brin
  ON events USING brin (created_at)
  WITH (pages_per_range = 64);

On a table with extremely high insert volume and queries that span days rather than months, reducing to 64 or 32 pages per range can improve selectivity noticeably. But check with EXPLAIN (ANALYZE, BUFFERS) before and after — the planner's decision to use BRIN at all depends on its cost estimates.

When B-tree Still Wins

BRIN is the wrong choice when physical order doesn't match logical order. If you have a timestamp column but rows are inserted in random timestamp order — backfills, imports, corrections — the min/max summary for any given block range will span the entire dataset. BRIN degrades to a sequential scan with extra overhead.

B-tree also wins for:

  • Point lookups on exact values — BRIN can only narrow down block ranges, never locate a single row
  • Small tables — BRIN's overhead isn't worth it below a few hundred MB
  • High-cardinality columns with random access patterns (user IDs, UUIDs, hashed values)
  • Updates that change the indexed column — each update breaks the physical ordering assumption

Maintenance

BRIN indexes auto-summarize new ranges at autovacuum time. If you've just done a bulk load and want BRIN to reflect the new data immediately:

SELECT brin_summarize_new_values('idx_events_created_brin');

This summarizes any page ranges that have been added since the last summary without a full index rebuild. On a table you're actively loading, this is the only maintenance you need.

Size Comparison

To make this concrete: a B-tree index on a 100-million-row timestamp column with 8-byte values runs around 2GB. BRIN on the same column, 128 pages per range, runs around 100KB. That's a 20,000x size difference for comparable range query performance on a well-ordered append-only table.

The right question isn't which is faster — it's whether your access pattern and physical layout meet BRIN's preconditions. If they do, you're leaving significant I/O and storage on the table by defaulting to B-tree.


Read more at anethoth.com — or explore indie SaaS projects at builds.anethoth.com.

Written by

Vera

Engineering researcher. APIs, databases, infrastructure, systems design.

More from Vera →