UPSERT and INSERT ON CONFLICT: Patterns for Atomic Inserts and Updates

The check-then-insert race condition is one of the oldest bugs in database programming. Postgres INSERT ON CONFLICT and the SQL standard MERGE give you atomic alternatives, but each has sharp corners that bite teams who reach for them without understanding the semantics.

The check-then-insert race is one of the oldest bugs in database programming. The application checks whether a row exists, finds that it does not, then inserts it. Between the check and the insert, another transaction does the same thing, and the application either gets a unique-constraint violation or, worse, ends up with duplicate rows because the application forgot to declare a unique constraint. Every team writes this bug at least once, and every team eventually wants the atomic version.

Postgres has had INSERT ... ON CONFLICT since version 9.5 in 2016. The SQL standard MERGE statement was added in Postgres 15 in 2022. Both give you atomic insert-or-update semantics, but each has sharp corners that bite teams who reach for them without understanding what they actually do. We use upsert patterns across DocuMint (idempotent invoice creation), CronPing (monitor heartbeat updates), FlagBit (flag definition synchronization), and WebhookVault (duplicate event deduplication), and the failure modes are consistent enough to be worth explicit treatment.

The basic pattern: INSERT ON CONFLICT DO UPDATE

The most common upsert in Postgres looks like this:

INSERT INTO monitors (token, last_ping_at, status)
VALUES ('abc123', now(), 'up')
ON CONFLICT (token) DO UPDATE
SET last_ping_at = EXCLUDED.last_ping_at,
    status = EXCLUDED.status;

The EXCLUDED pseudo-table refers to the row that would have been inserted if there had been no conflict. The conflict target (here token) must match an existing unique constraint or unique index; without one, Postgres has no way to know what counts as a duplicate, and the statement will fail at parse time.

The semantics are atomic in the sense that the insert and the update happen in a single statement, so no second transaction can race between them. They are also atomic in the sense that the row's identity is preserved: id values do not change, foreign keys pointing at the row remain valid, and indexes on the row are maintained incrementally rather than dropped and rebuilt.

The DO NOTHING variant: deduplication without overwriting

The other common form is ON CONFLICT DO NOTHING, which makes the insert silently fail rather than update:

INSERT INTO processed_events (event_id, processed_at)
VALUES ('evt_abc', now())
ON CONFLICT (event_id) DO NOTHING;

This is the right pattern for idempotency tracking: the application wants to know whether the event has been seen, but if it has, the existing row is the authoritative record. WebhookVault uses this exact pattern for duplicate event detection: the same event arriving twice causes the second insert to be silently dropped, and the consumer side reads the inserted row count to know whether to process the event.

One subtle behavior: RETURNING on a DO NOTHING statement returns rows only for actually-inserted rows. The pattern INSERT ... ON CONFLICT DO NOTHING RETURNING id is a clean way to ask "did I insert this row, or was it already there?" The application can branch on whether the result set is empty.

The sequence and trigger gotcha

The most surprising behavior of INSERT ON CONFLICT is that even in the conflict case, the row attempt counts. Specifically:

  • Sequence values are consumed. If the table has a SERIAL or IDENTITY column, every conflicting insert burns a sequence value. A high-volume upsert workload can produce large gaps in the sequence, and in rare cases can exhaust the sequence on a 32-bit integer column much faster than the row count would suggest. The workaround is to use BIGINT identity columns from the start.
  • BEFORE INSERT triggers fire on the inserted row even when the row ends up conflicting. The trigger sees the would-be row, can modify it, and the modified row participates in the conflict check. This is occasionally useful but more often surprising.
  • AFTER INSERT triggers do not fire on conflicting rows, but AFTER UPDATE triggers do fire if the conflict resolution is an update. The trigger logic has to account for the fact that the same logical "upsert" call may fire either insert or update triggers.

The MERGE alternative and its trade-offs

Postgres 15 added MERGE, the SQL standard's answer to upsert. The syntax is more flexible and more verbose:

MERGE INTO monitors AS t
USING (VALUES ('abc123', now(), 'up')) AS s(token, ping_at, status)
ON t.token = s.token
WHEN MATCHED THEN UPDATE SET last_ping_at = s.ping_at, status = s.status
WHEN NOT MATCHED THEN INSERT (token, last_ping_at, status)
VALUES (s.token, s.ping_at, s.status);

The advantages over INSERT ON CONFLICT: MERGE can use any matching condition (not just a unique constraint), it can express different actions for different conditions, and it can delete rows as part of the same statement. The disadvantages: it is more verbose, it does not have RETURNING in standard SQL (Postgres 17 added it as an extension), and the concurrency semantics are subtly different.

The concurrency subtlety is the one that catches teams. INSERT ON CONFLICT uses a special speculative-insert mechanism that handles concurrent inserts of the same key without ever raising a unique-constraint error. MERGE does not have this mechanism: under concurrent execution, two transactions may both decide that no matching row exists, both attempt to insert, and one will get a unique-constraint violation. The application has to either retry the violation or hold a lock. For workloads where the conflict rate is low, this is rarely a problem; for high-conflict workloads, INSERT ON CONFLICT is more robust.

Bulk upserts and the locking question

The single-row upsert is well-behaved. Bulk upserts of thousands of rows are where the locking patterns matter. A bulk INSERT ON CONFLICT DO UPDATE takes a row-level lock on each affected row in the order the rows are processed, which is determined by the order they appear in the VALUES clause and the order they are physically located in the table.

The deadlock failure mode: two bulk upserts that touch overlapping sets of rows in different orders will deadlock. The fix is to sort the values by the conflict key before issuing the upsert, so that two concurrent bulk upserts process rows in the same order and only one blocks waiting for the other. This is the same lock-ordering discipline that prevents deadlocks in any concurrent system, applied to bulk database operations.

The bloat failure mode: a bulk upsert that updates many rows generates dead tuples proportional to the update count, the same as any large UPDATE. Autovacuum will eventually clean these up, but a workload of repeated full-table upserts can outpace autovacuum and produce significant table bloat. The mitigation is the same as for any update-heavy workload: smaller batches, autovacuum tuning, and occasional VACUUM FULL or pg_repack runs during low-traffic windows.

The DO NOTHING DO UPDATE choice

A pattern question that comes up regularly: should the application use DO NOTHING and then explicitly UPDATE if needed, or use DO UPDATE directly?

The two patterns have different observability properties. DO UPDATE always writes, which means the row's updated_at timestamp and any version columns will be updated even when nothing actually changed. This produces extra WAL volume, extra dead tuples, and can mislead applications that watch updated_at to detect actual changes. DO NOTHING followed by a conditional UPDATE preserves the property that updated_at reflects an actual change.

The WHERE clause on DO UPDATE is the underused middle ground: ON CONFLICT (token) DO UPDATE SET status = EXCLUDED.status WHERE t.status IS DISTINCT FROM EXCLUDED.status. This only writes when the value actually changed and avoids the no-op write problem while keeping the single-statement atomicity.

What this is and is not

The honest summary: INSERT ON CONFLICT is the right default for most upsert needs in Postgres. It handles concurrent inserts robustly via speculative insertion, integrates cleanly with returning clauses, and the surprises are documented well enough that a team that reads the documentation will avoid them. MERGE is the right answer when the logic is more complex than insert-or-update or when SQL portability matters.

Neither is a substitute for thinking about the schema. The unique constraint or unique index that supports the upsert is the load-bearing piece, and the upsert statement is just the syntactic sugar that uses it atomically. Teams that reach for upsert without first ensuring that the table has the right unique constraints write the same race condition in a different shape, and discover it the same way.

Read more