Postgres pg_repack: Reclaiming Table Bloat Without Locking

AuthorVeraDomainDatabases / PostgresDepth#depth-2Type#type-deep-dive

VACUUM FULL works. CLUSTER works. They both reclaim bloat, rebuild indexes, and hand back disk space. They also hold an ACCESS EXCLUSIVE lock for the entire operation, which means your table is offline. For a 50GB table on a production system, that's the kind of maintenance window that requires a calendar invite.

pg_repack is the answer. It rebuilds tables online with only a brief lock at the end. If you have tables with significant bloat and can't afford downtime, this is the extension you need.

How pg_repack Works

The mechanism is straightforward once you see it:

Creates a shadow table with the same schema.
Copies rows into the shadow table in the desired order (clustered index order, if specified).
Installs row-level triggers on the original table to capture concurrent writes (INSERT, UPDATE, DELETE) into a log table throughout the copy phase.
Replays the captured log entries into the shadow table to catch up with concurrent changes.
Acquires an ACCESS EXCLUSIVE lock briefly — just long enough to swap the tables via catalog rename.
Drops the original (now old) table.

The lock duration is proportional to the number of unprocessed log entries at swap time, which is typically milliseconds on a non-firehose table. This is fundamentally different from VACUUM FULL, which holds the lock for the entire duration of the rebuild.

-- Install the extension (requires superuser)
CREATE EXTENSION pg_repack;

-- Repack a single table
SELECT pg_repack.repack_table('orders');

-- Or use the CLI tool (runs outside the database)
pg_repack -h localhost -d mydb -t orders

-- With CLUSTER order (reorders rows by index)
pg_repack -h localhost -d mydb -t orders --order-by created_at

When to Use It

The canonical use case is a table where autovacuum can't keep up with dead tuple accumulation. Check your candidates:

SELECT
  schemaname || '.' || relname AS table_name,
  n_dead_tup,
  n_live_tup,
  round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
  pg_size_pretty(pg_total_relation_size(schemaname || '.' || relname)) AS total_size
FROM pg_stat_user_tables
WHERE n_live_tup + n_dead_tup > 10000
ORDER BY dead_pct DESC NULLS LAST
LIMIT 20;

Tables with more than 50% dead tuples where autovacuum hasn't resolved it are good candidates. Tables that received a bulk delete or bulk update are also candidates — even after autovacuum runs, the physical file doesn't shrink unless VACUUM FULL or pg_repack reclaims the space.

What You Need to Know Before Running It

Disk space: pg_repack creates a full copy of the table as a shadow. You need free disk space equal to the current table size (including bloat). If your 40GB table is 70% bloat, you still need 40GB free — the shadow copy starts full size.

Replication slot lag: On long-running repacks of large tables, logical replication slots can accumulate WAL because the triggers generate write volume. Monitor pg_replication_slots.confirmed_flush_lsn lag during the operation. If you're running replicas, coordinate with your replication setup before repacking very large tables.

Index rebuild time: pg_repack rebuilds all indexes on the table as part of the operation. The total time is dominated by the copy phase plus index reconstruction, not the final lock acquisition. For a 200GB table with 12 indexes, plan for hours.

Superuser or replication privilege: pg_repack needs sufficient privileges to create triggers and manipulate catalog entries. Typically superuser or at minimum REPLICATION privilege in Postgres 10+.

pg_repack vs pg_squeeze

pg_squeeze is a newer alternative with similar goals. The main differences:

pg_squeeze uses logical decoding instead of triggers to capture concurrent writes, which generates less overhead during the copy phase.
pg_squeeze is less widely deployed and has fewer production miles behind it.
pg_repack has a CLI tool; pg_squeeze is pure SQL.
Both require the extension installed on the server.

For most shops, pg_repack is the safer choice simply because it's been battle-tested longer. pg_squeeze is worth evaluating if you're on Postgres 14+ and want lower WAL overhead during repack operations.

What pg_repack Does Not Solve

Index-only bloat: If your table itself is healthy but a specific index has become bloated (common with btree indexes on append-mostly tables with high delete rates), pg_repack will rebuild the index as part of the table repack — but you can also use REINDEX CONCURRENTLY for index-only rebuilds without repacking the entire table.

Small tables: For a 10MB table, VACUUM FULL takes seconds and the downtime is trivial. Don't use pg_repack for small tables. The overhead of setting up triggers and shadow tables isn't worth it for anything that VACUUM FULL handles in under 30 seconds.

Root cause: pg_repack fixes the symptom, not the cause. If your update patterns generate excessive bloat, autovacuum tuning (lowering autovacuum_vacuum_scale_factor for hot tables, increasing autovacuum_vacuum_cost_delay patience) is the thing to fix. pg_repack is emergency maintenance, not a substitute for proper vacuum configuration.

Building in public? builds.anethoth.com is a directory of software projects with public build dossiers — shipped milestones, known limitations, and proof the work is real.