Postgres VACUUM FULL vs pg_repack: Reclaiming Disk Space Without Downtime

VACUUM FULL is the textbook way to reclaim disk space in Postgres. It also takes an ACCESS EXCLUSIVE lock for the duration of the rewrite, which makes it unusable in production. pg_repack is the workaround everyone eventually finds.

Every long-running Postgres database eventually needs to reclaim disk space. The mechanism that produces the bloat is well understood: MVCC keeps old row versions visible until vacuum cleans them up, autovacuum keeps the bloat bounded under normal loads, and the table file grows when the bloat exceeds what autovacuum can keep up with. The discipline of monitoring dead tuple ratios catches the problem early. But once a table has gigabytes of bloat that autovacuum is no longer reclaiming, the question is what to do about it.

The textbook answer is VACUUM FULL. The textbook also notes that VACUUM FULL takes an ACCESS EXCLUSIVE lock on the table for the duration of the operation, which on a production table can mean hours of unavailability. We have not yet hit this problem at the scale of DocuMint, CronPing, FlagBit, or WebhookVault because SQLite handles the current load, but the right approach is well established and worth writing down before it becomes urgent.

Why bloat happens

Postgres uses MVCC for concurrency control. An UPDATE does not modify the row in place; it inserts a new tuple and marks the old one as obsolete. A DELETE marks a tuple as obsolete without removing it. The obsolete tuples are visible to transactions that started before the change and become invisible when no transaction can possibly see them. Vacuum reclaims the space they occupy and either returns it to the operating system (if it is at the end of the file) or makes it available for new tuples (if it is in the middle).

The bloat accumulates when vacuum cannot keep up with the rate of obsolete-tuple production, which happens under three patterns. High update rates on hot rows produce per-row bloat that scales with update count. Long-running transactions hold the visibility horizon back so that vacuum cannot remove tuples newer than the transaction start, which can grow the bloat by orders of magnitude during the long transaction. And bulk DELETE or UPDATE operations produce a sudden spike of bloat that autovacuum will eventually address but slowly.

What VACUUM does and does not do

The standard VACUUM command (without FULL) marks dead tuples as reusable space within the existing table file but does not return space to the operating system. The table file stays the same size, the bloat stays in place as fragmented free space, and new INSERTs use the freed slots before extending the file. This is fine for most workloads because the freed slots get reused.

The case where VACUUM is not enough is when the table has shrunk significantly (a large DELETE or partition drop) and the freed space is more than the table will reuse over a reasonable window. The table file holds onto disk space that is mostly empty, and the reads have to skip past the empty space to find the actual data. The space cannot be returned to the operating system without rewriting the table.

VACUUM FULL: correct but disruptive

VACUUM FULL rewrites the entire table to a new file with no bloat, then atomically swaps the new file in. The mechanism is correct and the result is exactly what you want: a compact table with all the dead space gone. The cost is that VACUUM FULL takes an ACCESS EXCLUSIVE lock on the table from the start of the rewrite to the swap, which blocks every other operation including SELECTs. The rewrite duration scales with table size, so a 100GB table can be hours of unavailability.

The lock duration is the disqualifying problem for production use. There is no per-batch chunking, no online variant in standard Postgres, and the lock cannot be released and reacquired. The discipline that works is to use VACUUM FULL only on tables that can afford the unavailability, which usually means small tables, archive tables that are not actively read, or maintenance windows long enough to absorb the rewrite.

pg_repack: the workaround everyone finds

pg_repack is a Postgres extension that achieves the same end result as VACUUM FULL but without the ACCESS EXCLUSIVE lock. The mechanism is to create a new table that mirrors the original schema, copy the live rows in chunks while a trigger captures any concurrent changes, replay the captured changes to the new table, and then take a brief exclusive lock to swap the new table in for the old one. The exclusive lock at the swap is measured in seconds rather than hours.

The trade-offs are the disk space cost (you need 2x the table size during the rebuild plus space for the change log), the modest CPU cost of the trigger that captures changes during rebuild, and the requirement that the table have a primary key or unique not-null index that pg_repack can use as the natural ordering for the rebuild. The disk space requirement is the constraint that usually surprises teams: a 100GB table needs 200GB of free space during the rebuild, plus headroom for ongoing writes and the change log.

pg_repack is a separate project from Postgres core, available as an extension package on most Postgres distributions. The installation is one CREATE EXTENSION away on managed Postgres services that include it. The operational use is one pg_repack command per table, with the rebuild running for hours on large tables but not blocking application traffic.

What pg_repack does not do

pg_repack rebuilds tables and indexes; it does not reduce the bloat in catalog tables, it does not address transaction-ID wraparound directly (though autovacuum handles that), and it does not work on tables without a usable primary key or unique index. Tables that fall into the last category are usually log-style append-only tables, and the right answer is often partitioning rather than rebuilding.

The other limitation is that pg_repack does not improve query performance per se. It reclaims disk space and reduces the size of sequential scans, which helps queries that touch large fractions of the table. Index-only scans on properly-indexed queries are largely unaffected by table bloat. The discipline of measuring before and after is worth applying because the operational cost is real and the benefit varies by workload.

Index bloat: a separate problem

Index bloat is a related but separate problem. Indexes accumulate bloat from the same MVCC mechanics: when a row is updated, the new tuple gets a new index entry, the old tuple's entry stays until vacuum cleans it up. Heavy update workloads on indexed columns produce index bloat that REINDEX (or REINDEX CONCURRENTLY in Postgres 12+) can reclaim.

REINDEX CONCURRENTLY is the online version of REINDEX, available since Postgres 12. The mechanism is similar to pg_repack: build a new index in parallel with the old one, validate that it is consistent, and swap atomically. The lock requirements are minimal and the operation can run on a live production index. The cost is the disk space for the new index during the rebuild and the modest performance impact of maintaining two indexes during the build.

The pattern that scales is to monitor index bloat alongside table bloat, to REINDEX CONCURRENTLY indexes that exceed a bloat threshold (often 30-50 percent), and to address the underlying update pattern when index bloat keeps recurring. Indexes on frequently-updated columns produce the most bloat, and partial indexes or HOT updates can sometimes reduce the rate.

The autovacuum tuning angle

Bloat accumulating to the point of needing pg_repack is often a symptom that autovacuum is not aggressive enough for the workload. The default autovacuum thresholds (20 percent dead tuples plus 50 row absolute floor) are conservative for large tables, where 20 percent of a billion-row table is 200 million dead tuples before autovacuum even starts. The per-table override pattern via ALTER TABLE SET autovacuum_vacuum_scale_factor is the right tuning lever.

The other autovacuum knob worth tuning is autovacuum_vacuum_cost_limit, which controls how aggressive autovacuum is about doing its work versus yielding to other workloads. The default of 200 is conservatively tuned for spinning-rust era hardware; on modern NVMe storage, values of 1000-2000 are reasonable and produce dramatically faster vacuum cycles. The trade-off is more I/O and CPU consumption during vacuum, which on a busy database can produce noticeable load.

The deeper observation

Disk-space reclamation is one of the database operational disciplines where the textbook answer is correct but disqualifying for production use, and the practical answer requires an extension that is not part of standard Postgres. The pattern recurs across operational concerns: the simple mechanism that works in development scales badly to production, and the production-scale mechanism is more complex but well established. The discipline that pays off is to learn the production-scale mechanism before you need it, not after the table has grown to the size where the simple mechanism would block production for hours.

The wider observation is that running Postgres in production is mostly about understanding the operational characteristics of mechanisms that are conceptually simple. Vacuum is conceptually simple, MVCC is conceptually simple, REINDEX is conceptually simple. The complexity is in the operational details: when each runs, how each interacts with concurrent workloads, what each costs in disk space and lock duration. The operational understanding is what separates teams that run Postgres reliably from teams that have a bad time with it.

Read more