REINDEX CONCURRENTLY: Rebuilding Postgres Indexes Without Blocking Writes

Indexes bloat. They corrupt rarely. They become suboptimal when their definition changes. The classic REINDEX takes an ACCESS EXCLUSIVE lock that production cannot tolerate. REINDEX CONCURRENTLY is the answer most teams do not know about.

Indexes do not stay in pristine condition. Update-heavy tables accumulate dead entries that vacuum cannot fully reclaim. Glibc collation upgrades change sort order silently and break unique constraints months later. New opclasses or column definitions make existing indexes suboptimal. A new statistics target reveals that an index choice was wrong. All of these situations want a rebuild.

The classic REINDEX TABLE foo answer takes an ACCESS EXCLUSIVE lock on the table for the duration of the rebuild. On a 50 GB table with a few indexes, that is a multi-minute outage. No production team can afford that, and so the rebuild gets postponed indefinitely while the bloat compounds.

Postgres 12 made REINDEX CONCURRENTLY generally available. It does the rebuild in three phases, holding only a SHARE UPDATE EXCLUSIVE lock that allows reads and writes throughout. The cost is that the rebuild takes longer in wall-clock time and consumes more disk. The benefit is that you can run it on a live production table during business hours.

How it actually works

The three phases are: build a new index in parallel with the old one, mark the new one valid, drop the old one. Specifically, Postgres creates a new index with a temporary name, scans the table to populate it, waits for any transactions that started before the build to complete (because they might be writing the table in ways the new index needs to capture), and then atomically swaps the names. The old index is dropped.

The wait between build and swap is where the operational subtlety lives. If a long-running transaction is sitting open, the concurrent reindex cannot complete. It will sit waiting until that transaction commits or aborts. This is the same behavior as CREATE INDEX CONCURRENTLY and the same diagnostic applies: check pg_stat_activity for transactions older than the start of the reindex.

The disk cost is roughly double the index size during the build, plus WAL volume proportional to the table size as the new index is populated. On a 50 GB table with a 5 GB index, expect 5-10 GB of extra disk usage and 50+ GB of WAL during the rebuild. Plan storage and replication lag accordingly.

What can go wrong

The most common failure mode is invalid indexes. If REINDEX CONCURRENTLY fails partway through, you can end up with an index named foo_pkey_ccnew or foo_pkey_ccold sitting around in invalid state. These need to be cleaned up manually with DROP INDEX. The query to find them is SELECT * FROM pg_index WHERE NOT indisvalid;

A retry attempt that runs into a leftover invalid index will fail until the old artifact is dropped. The discipline is to check for invalid indexes after any failed reindex attempt and clean them up before retrying.

The second failure mode is constraint validation. REINDEX CONCURRENTLY on a unique index will fail if the underlying data has somehow become non-unique while the rebuild was running. The most common cause is glibc collation drift on text columns: rows that compared equal under the old collation now compare differently, and what was a unique pair is now a duplicate. The fix is to find and resolve the duplicates first, then retry.

The third failure mode is when the reindex is genuinely too slow to complete within the maintenance window. If the table is being written aggressively, the new index has to keep up with every write that lands during the build. There is no good workaround other than scheduling the reindex during quieter hours, or using pg_repack for the rare cases where the table itself needs reorganization.

When to use REINDEX CONCURRENTLY

The four most common cases. First, after a major glibc upgrade that changed collation behavior, every text-column unique index needs a rebuild. The Postgres team has documented this issue extensively. The symptom is mysterious unique constraint violations months after an OS upgrade.

Second, after substantial deletion of rows from an indexed table. VACUUM reclaims index space but does not always shrink the index file. If you have deleted 80 percent of a 10 GB table, the index may still occupy most of its original space. A rebuild compacts it.

Third, when adding a new column definition or changing an opclass. REINDEX with the new definition pulls the index into line without dropping and recreating manually.

Fourth, as a periodic maintenance task on high-update tables. Most teams overestimate the need for this. Routine bloat is well-handled by autovacuum and HOT updates, and the rebuild cost is real. Reserve scheduled reindex for tables where pg_stat_user_indexes shows significant bloat that does not respond to vacuum.

The procedural pattern

The discipline that catches most issues. Before starting: check for long-running transactions with SELECT pid, query, state, xact_start FROM pg_stat_activity WHERE xact_start < now() - interval '5 minutes'; and resolve them. Check disk space: ensure free space at least equal to the largest index being rebuilt. Check replication lag if you have replicas: a large reindex will produce significant WAL and may saturate the replication channel.

Run with statement_timeout = 0 for the session to prevent the rebuild from timing out. Monitor progress with pg_stat_progress_create_index (available since Postgres 12): it shows phase, blocks done, blocks total, and current operation. A rebuild that has stopped progressing is a sign of a blocking transaction.

After completion, verify with SELECT * FROM pg_index WHERE NOT indisvalid; that no invalid indexes were left behind. Run ANALYZE on the table to refresh statistics, because the planner uses index size as a cost input and the size has just changed.

The single-index alternative

When you need to rebuild only one index, REINDEX INDEX CONCURRENTLY foo_idx targets just that index. This is usually cheaper and faster than rebuilding the whole table's indexes, and the disk overhead is bounded by the one index size. For routine maintenance, prefer per-index rebuilds over table-wide rebuilds.

For unique constraints created by ALTER TABLE ADD CONSTRAINT, the underlying index can still be rebuilt concurrently. The constraint itself does not need to be touched. This is the common case for primary keys and unique columns: rebuild the index, the constraint continues to enforce.

What this does not replace

The rebuild handles bloat and stale collation. It does not handle wrong index choice (you need to add or drop the index for that, with CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY). It does not handle table-level bloat (you need pg_repack or careful VACUUM FULL with downtime). It does not handle statistics drift (you need ANALYZE).

The mental model is that REINDEX CONCURRENTLY is one tool in a maintenance toolkit, alongside CREATE INDEX CONCURRENTLY and DROP INDEX CONCURRENTLY and VACUUM and ANALYZE. Together they cover most online maintenance needs. The combination is what makes long-running Postgres production deployments possible without scheduled downtime.

What we do across the four products

Across DocuMint, CronPing, FlagBit, and WebhookVault, we run on SQLite, where the analogue is VACUUM with optional REINDEX. SQLite does not have the same online rebuild story because it does not have the same write concurrency story: writes serialize through a single writer anyway, and the rebuild simply joins that queue.

The Postgres patterns become relevant the moment a product graduates from SQLite to PostgreSQL, which we expect to happen for whichever product first crosses the write-rate threshold where a single SQLite writer becomes a bottleneck. The migration is one of the cases where REINDEX CONCURRENTLY earns its keep: the new Postgres deployment can be rebuilt periodically without affecting users.

The deeper observation is that Postgres operational tooling is unusually mature for a database that runs in commodity environments. The combination of CONCURRENTLY variants for index DDL, online VACUUM, pg_stat family for observability, and the WAL-based replication story makes 99.9-percent-availability production deployments achievable without specialized expertise. The cost is learning the operational vocabulary. The benefit is that the vocabulary stays useful for decades.

Read more