Postgres CREATE INDEX CONCURRENTLY: Adding Indexes to Production Tables Without Blocking Writes

CREATE INDEX takes an ACCESS EXCLUSIVE lock for the duration of the build. On large tables that means minutes to hours of blocked writes. CREATE INDEX CONCURRENTLY trades roughly double the build time for non-blocking behavior, and the trade is almost always worth it in production.

CREATE INDEX is one of the simplest-looking DDL statements in Postgres and one of the easiest to get badly wrong in production. The default behavior takes an ACCESS EXCLUSIVE lock on the target table for the entire duration of the build, which blocks every read and every write on the table until the index finishes. For a small table or a non-production environment this is fine. For a 50-gigabyte table on a live production system it is a multi-hour outage.

CREATE INDEX CONCURRENTLY is the alternative that exists for exactly this case. It allows reads and writes to continue against the table while the index builds, at the cost of roughly twice the total build time and a more complex failure mode. The trade is almost always worth it in production, but understanding the mechanism is necessary to operate it confidently.

What CONCURRENTLY actually does

The non-concurrent build is one phase: scan the table once with an ACCESS EXCLUSIVE lock, build the index, mark it valid, release the lock. The concurrent build is three phases. First, scan the table to build the initial index using a SHARE UPDATE EXCLUSIVE lock that does not block concurrent reads or writes. Second, wait for all transactions that began before the build started to complete, so the second pass can pick up any rows they wrote. Third, do a second pass of the table to incorporate any concurrent changes, then mark the index valid.

The waiting phase between the first and second passes is one of the most commonly misread aspects of the operation. It looks like the build has stalled, but it is actually waiting for a transaction that started before the CREATE INDEX CONCURRENTLY ran and has not yet committed. Long-running transactions, including idle-in-transaction sessions, block the second phase indefinitely. The diagnostic is to query pg_stat_activity for old transaction start timestamps.

The total work done is about twice that of a non-concurrent build: two full table scans plus the waiting plus the lock-acquisition overhead. The wall-clock time is usually two to three times longer than the equivalent non-concurrent operation. For a one-hour non-concurrent build, expect a two-to-three-hour concurrent build.

The invalid index problem

The most operationally important quirk of CREATE INDEX CONCURRENTLY is that a failure midway leaves an invalid index on the table. The index exists in the system catalog but is marked invalid and is not used by the planner. It still imposes the write overhead of an index, because every write must maintain it, but provides none of the read benefit. The disk space is consumed and the writes are slower.

A failed concurrent index build requires manual cleanup. The right pattern is to detect the invalid index via pg_index.indisvalid = false, drop it explicitly with DROP INDEX CONCURRENTLY, and retry the build after fixing the underlying cause. The most common causes of failure are statement_timeout firing mid-build, lock-timeout firing on the catalog updates, disk space exhaustion, or a manual cancellation. The diagnostic query is:

SELECT i.relname AS index_name, t.relname AS table_name
FROM pg_index ix
JOIN pg_class i ON i.oid = ix.indexrelid
JOIN pg_class t ON t.oid = ix.indrelid
WHERE ix.indisvalid = false;

Some teams add this query to their monitoring dashboard with an alert at any non-zero result. Invalid indexes consume disk and writes silently, and a forgotten invalid index can stay in production for months before someone notices.

Configuration before the build

Several configuration choices materially affect how a concurrent index build runs. The most important is statement_timeout, which must be set to 0 for the session running the build or the build will eventually time out and leave an invalid index. The pattern is SET statement_timeout = 0; SET lock_timeout = 0; before the CREATE INDEX CONCURRENTLY statement.

The maintenance_work_mem setting controls how much memory the build can use for sorting. Larger values produce faster builds for tables that fit in memory, up to a few gigabytes for very large tables. The default of 64MB is conservative and worth raising for any non-trivial build. Set it for the session before the build via SET maintenance_work_mem = '2GB'.

The max_parallel_maintenance_workers setting allows index builds to use multiple workers since Postgres 11. The default of 2 is reasonable but can be raised on hosts with spare CPU. Each worker consumes maintenance_work_mem so the total memory cost is multiplicative; budget accordingly.

The autovacuum interaction is subtle. A concurrent index build prevents autovacuum from processing the target table for the duration. On heavily-updated tables this can produce noticeable dead-tuple buildup, particularly for hours-long builds. Plan for an ANALYZE and possibly a manual VACUUM after the build completes.

The unique-index trap

CREATE UNIQUE INDEX CONCURRENTLY has a particularly nasty failure mode. If any duplicate values exist in the column at the time of the build, the build fails late in the process, after most of the work has been done, and the failure produces an invalid index that must be dropped before retry. The duplicate check happens at the end of the second pass, so a build that has been running for two hours can fail in the last minute.

The standard pattern for adding a unique constraint to a live table is to first add a regular unique index concurrently, verify it is valid, then add the constraint via ALTER TABLE ADD CONSTRAINT ... UNIQUE USING INDEX, which is a metadata-only operation that takes a brief ACCESS EXCLUSIVE lock. The constraint creation cannot fail because the index has already been validated.

For tables where duplicate detection is the actual goal, a separate audit pass before the build is usually faster than letting the build itself detect duplicates. The audit can run as SELECT col, count(*) FROM table GROUP BY col HAVING count(*) > 1 with appropriate work_mem to use a hash aggregation, and the results can be reviewed and resolved before the index build starts.

Replica interaction

On systems with streaming replication, a CREATE INDEX CONCURRENTLY on the primary replicates to replicas as a regular CREATE INDEX, which takes an ACCESS EXCLUSIVE lock on the replica. The standby cannot serve reads against the locked table for the duration of the index build on the replica. For asynchronous replicas this is usually acceptable; for synchronous replicas with read traffic it can be disruptive.

The mitigation is to schedule index builds during low-traffic windows on the replica, or to use hot_standby_feedback to avoid statement cancellation, or to accept the temporary unavailability of the table for replica reads. The non-concurrent build on the replica typically takes the same time as the original non-concurrent build would have, not the longer concurrent build time, because the replica is replaying a logged operation rather than rebuilding the index from scratch.

What CREATE INDEX CONCURRENTLY does not solve

The mechanism does not eliminate the operational cost of large index builds entirely. It eliminates the blocking-writes cost, which is usually the dominant cost in production. It does not eliminate the disk space cost, which is roughly equal to the final index size during the build and the full final size after. It does not eliminate the I/O cost on the table being indexed, which can saturate disk bandwidth on large tables and indirectly slow other operations.

It also does not work within a transaction block. CREATE INDEX CONCURRENTLY must be executed outside a transaction, which means it cannot be combined with other DDL in an atomic migration. Migration tools that wrap every operation in a transaction by default need explicit opt-out for concurrent index builds. Most modern migration tools support this via an annotation or per-statement configuration, but it is a common source of migration failures for teams new to the pattern.

Three operational signals to monitor

First, the count of invalid indexes via pg_index.indisvalid = false. This should be zero in steady state. A non-zero value indicates either a failed build that has not been cleaned up or an in-progress build, both of which deserve attention.

Second, the long-running-transaction count, particularly for sessions with state = 'idle in transaction'. A long-running transaction that started before a concurrent index build will block the second phase indefinitely. The diagnostic query selects from pg_stat_activity where state in ('active', 'idle in transaction') and xact_start is older than a threshold.

Third, the disk-space headroom on the data volume. A large concurrent index build can consume tens or hundreds of gigabytes of disk and the build will fail mid-stream if the disk fills. The standard ops practice is to confirm headroom equal to roughly 1.5 times the expected final index size before starting any large concurrent build.

Application across our four products

The four products run on SQLite, which has a simpler index story without ACCESS EXCLUSIVE locks at table scope. Index creation in SQLite takes a shorter lock at the database level, and the locks SQLite uses are at the connection level rather than the row level used by Postgres. For our scale this is not a constraint. The Postgres migration plan includes index review as one of the migration steps, with the understanding that any index addition on a non-trivial table after migration will require CREATE INDEX CONCURRENTLY.

The deeper observation is that CREATE INDEX CONCURRENTLY is the kind of feature whose absence is invisible in development and whose presence becomes load-bearing at production scale. The teams that have not learned to reach for it instinctively are the ones that have had a production index build lock a critical table for hours. The teams that have learned it usually learned it the hard way, and the lesson sticks.

Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) keep the lights on.

Read more