Why Your Database Has Too Many Indexes: The Write Amplification Tax Nobody Measures

Applies toPostgreSQL (all versions), MySQL, SQLiteKey viewspg_stat_user_indexes, pg_stat_user_tablesThe problemIndexes you create for reads make your writes progressively slower

Every index on a table is a separate data structure that needs to be kept consistent with the table. Insert a row, and every index on that table gets an entry. Update an indexed column, and the old entry gets removed and a new one gets added. Delete a row, and every index loses its entry. The cost of maintaining indexes doesn't appear in slow query logs, doesn't show up obviously in EXPLAIN output, and rarely appears in the places developers look for write performance problems. It compounds quietly across every INSERT, UPDATE, and DELETE.

Building in public at builds.anethoth.com — proof that a product is really being built.

Measuring write amplification with pg_stat_user_tables

SELECT
  relname,
  n_tup_ins,
  n_tup_upd,
  n_tup_hot_upd,
  n_tup_del,
  -- HOT update rate: updates that avoided index maintenance
  round(n_tup_hot_upd::numeric / nullif(n_tup_upd, 0) * 100, 1) AS hot_pct
FROM pg_stat_user_tables
WHERE n_tup_upd > 0
ORDER BY n_tup_ins DESC
LIMIT 20;

The n_tup_hot_upd column is the signal. HOT updates (Heap-Only Tuple updates) are updates that Postgres can perform without touching any indexes — the updated tuple stays on the same page as the old one, and the old one just gets a pointer to the new one. HOT updates are only possible when no indexed column is changed. If you have fifteen indexes on a table and you update one column that's covered by even one of those indexes, you cannot HOT update — Postgres must update every index that covers the changed column.

A table where hot_pct is consistently below 30% is either a table where you're updating indexed columns frequently (which may be necessary) or a table with too many indexes relative to how it's actually used.

Finding unused indexes with pg_stat_user_indexes

SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey'
  AND indexrelname NOT LIKE '%_unique%'
ORDER BY pg_relation_size(indexrelid) DESC;

idx_scan = 0 means the index has never been used by the query planner since the last stats reset. These are your best candidates for removal. The caveats:

Stats reset at server restart. If your server restarted recently, the count is low and meaningless.
Some queries only run monthly (billing reports, compliance exports). Watch for 30+ days before declaring an index unused.
Unique constraints are enforced via their indexes even when those indexes aren't used for SELECT. Don't remove them based on idx_scan alone.
Primary key indexes (_pkey) are always required.

The common over-indexing patterns

The composite index that duplicates its leading column. An index on (user_id, created_at) can satisfy queries that filter only on user_id, because it can range-scan the leading column. A separate single-column index on user_id is redundant if the composite index exists. But developers add it anyway because it looks like it might be faster for simple lookups. It isn't — the optimizer will use the composite index for those queries. The single-column index just exists to receive writes.

The index on a low-selectivity boolean column. An index on is_deleted where 50% of rows have is_deleted = false will almost never be used for reads — the optimizer knows the index saves almost no work compared to a sequential scan. But it still receives every write. The threshold varies, but indexes on columns with fewer than five distinct values and roughly even distribution are often never used for reads while always costing writes.

The migration index. Someone ran a backfill that needed fast lookups on a column, created an index for it, and then forgot to drop it when the migration finished. This index now receives every write indefinitely. Check for indexes created around the time of known migrations and verify they're still needed.

The ORM default. Some ORMs create indexes on every foreign key column by default. Sometimes this is right. Often it isn't — a foreign key from a small table to a large table may never be traversed from the small table's side in a real query, and the index on the small table's FK column just costs writes. Audit ORM-generated indexes with the same criteria as any other index.

Removing indexes safely

The risk of removing an index that's actually needed is that a query you run infrequently (the monthly report, the weekly batch job) suddenly does a sequential scan on a large table. The mitigation is time.

Watch pg_stat_user_indexes.idx_scan for at least 30 days, covering a full business cycle including month-end and quarter-end if those involve batch processes. An index with idx_scan = 0 after 30 days covering those windows is almost certainly safe to drop.

If you drop an index and realize it was needed, recovery is straightforward: CREATE INDEX CONCURRENTLY rebuilds the index without locking the table. The rebuild takes time proportional to table size, but it doesn't block production reads or writes. Treating index removal as low-risk because reversal is cheap is the right mental model.

What idx_tup_read vs idx_tup_fetch tells you

SELECT
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch,
  round(idx_tup_fetch::numeric / nullif(idx_tup_read, 0) * 100, 1) AS heap_fetch_pct
FROM pg_stat_user_indexes
WHERE idx_scan > 0
  AND idx_tup_read > idx_tup_fetch * 2  -- reading index but not fetching heap rows
ORDER BY idx_tup_read DESC;

idx_tup_read is how many index entries were read. idx_tup_fetch is how many heap rows were then fetched. If idx_tup_read is much larger than idx_tup_fetch, it can mean: the index is being used for sorting or grouping without needing to read the full row, or an index-only scan is satisfying queries from the index alone (and the heap fetch count reflects visibility map misses). If idx_scan > 0 but idx_tup_read = 0, the index is being used as a sort key in a context where it doesn't need to scan any entries — unusual but valid.

The main diagnostic question is simpler: is idx_scan above zero, and is it above zero across a long enough window to reflect your actual workload? An index with idx_scan = 0 after a full business cycle is a write tax with no offsetting read benefit. That's the index to remove.

The question isn't whether an index might someday be useful. It's whether it's useful now, regularly enough to justify the write cost on every insert, update, and delete. Most tables have at least one index that fails that test. Some have five.