Postgres COPY: Bulk Loading Without Killing Production

INSERT statements are the wrong tool for bulk loads. COPY moves data into Postgres at 10-100x the throughput, but the default invocation will lock the table, blow out the WAL, and starve concurrent queries. The patterns that survive production are about what you do around COPY, not COPY itself.

The first time a team discovers that Postgres has a COPY command, the reaction is usually some version of "wait, why have we been doing INSERTs all this time." A loop of INSERT statements that takes 40 minutes to ingest a million rows will often finish in 30 seconds with COPY. The performance difference is large enough that for any data load over a few thousand rows, INSERT is the wrong tool.

The harder lesson comes the first time a team uses COPY in production without thinking about what is around it. The naive COPY my_table FROM '/tmp/data.csv' will hold an exclusive lock on the destination table, generate WAL faster than the replicas can apply it, blow out the buffer cache, and starve every concurrent query for the duration of the load. Used carelessly, COPY is a tool for taking your database offline efficiently. The patterns that survive production are about what you do around COPY, not COPY itself. We have used these patterns across DocuMint, CronPing, FlagBit, and WebhookVault for backfills, migrations, and analytics extracts.

Why COPY is fast

COPY is fast because it skips the work that INSERT does per row. Each INSERT goes through SQL parsing, query planning, permission checking, trigger firing, and constraint validation as a separate transaction. The per-row overhead is fixed at a few hundred microseconds. For a single row that is unobservable; for a million rows it is the entire query duration.

COPY parses the input format once, plans the load once, and streams the rows into the table with minimal per-row overhead. Constraint and trigger validation still runs (unless you suppress it), but the parsing, planning, and protocol overhead are amortized across the entire load. The result is throughput in the range of 100,000 to 1,000,000 rows per second on modern hardware, compared to 1,000 to 10,000 rows per second for unbatched INSERT.

The default invocation problem

A bare COPY my_table FROM '/path/to/data.csv' has three production-hostile defaults. First, it runs in a single transaction, which means the entire load is one giant write that holds locks for its duration. Second, it generates WAL for every row at full speed, which can overwhelm replication. Third, the rows it loads compete for the buffer cache with the queries your live traffic is running, so the cache hit rate for production queries drops during the load.

The fix is to chunk the load. Instead of one COPY of a million rows, run twenty COPYs of fifty thousand rows each. Each chunk completes quickly, locks are released between chunks, and you have natural pause points where you can throttle. The total load takes slightly longer than the unchunked version (because there is per-chunk overhead) but it costs production almost nothing.

The unlogged-table pattern

For loads where the destination table is empty (an initial import or a complete refresh of a derived table), the fastest pattern is to create the table as UNLOGGED, COPY into it, and then convert it to a logged table. Unlogged tables do not write to the WAL, which can speed up the load by 2-5x on write-heavy systems. The downside is that unlogged tables are truncated on crash; for an initial load this is fine because you can just re-run the import.

After the COPY completes, run ALTER TABLE my_table SET LOGGED to enable WAL for subsequent writes. The conversion itself is fast because it just changes the table's metadata; the existing data is not rewritten. This pattern is correct for tables that exist for the purpose of receiving a load and do not need crash recovery during the load itself.

Indexes and constraints during load

Every index and constraint on the destination table costs CPU time per inserted row. For loads into an empty table, the right pattern is to drop the indexes, run the COPY, and recreate the indexes. Building an index in one pass over the loaded data is faster than building it incrementally during the load, often by 2-3x.

For loads into a non-empty table, dropping indexes is rarely possible because production traffic depends on them. The next best thing is to use CREATE INDEX CONCURRENTLY on any new indexes you need on the loaded data, scheduled after the load completes, so the index build does not block writes. Foreign-key constraints can be deferred using ALTER TABLE ... ALTER CONSTRAINT ... DEFERRABLE and then validated in a single pass after the load, which is faster than validating row-by-row during the load.

The transactional outbox for backfills

A common case is backfilling a column that was added to an existing table. The temptation is to run a single UPDATE over the entire table, which is the worst possible approach: it holds an ACCESS EXCLUSIVE lock for the duration, generates WAL for every row, and bloats the table because every updated row produces a new tuple with the old one needing later vacuum.

The right pattern is chunked updates with explicit transaction boundaries. Process the table in batches of a few thousand rows at a time, commit between batches, and pause briefly between batches to let autovacuum and replication catch up. The total time is longer than the single-UPDATE approach but each batch holds locks for milliseconds rather than minutes, and production traffic is essentially unaffected.

What to do during the load

Five operational signals to watch during a large load. First, replication lag on every replica. A load that gets ahead of replication risks the replicas falling so far behind that they cannot catch up; pause if lag exceeds 30 seconds. Second, WAL generation rate, which is correlated with replication risk and disk fill. Third, buffer cache hit rate for production queries, which should not drop below baseline by more than 5-10%. Fourth, locks held by the load process, which should be the destination table only and held for batch duration only. Fifth, autovacuum status, because chunked updates produce dead tuples that autovacuum needs to clean up; if vacuum cannot keep up with the load rate, table bloat grows unboundedly.

The deeper observation

COPY is the fastest path to getting data into Postgres, and the fastest path is also the path where the most operational damage happens if you do not pay attention to what is around it. The chunking, unlogging, index-management, and pause-discipline patterns that make COPY production-safe are not technical innovations; they are just the basic discipline of treating a high-throughput operation with the respect that high-throughput operations deserve. The teams that get this right are the teams that stop thinking of COPY as a single command and start thinking of it as a small protocol with steps before, during, and after.

Read more