Database Sequences and Auto-Increment Without Gaps: A Field Guide

Database-generated IDs almost always have gaps, and most of the time that's fine. The cases where it isn't fine require different machinery, and the cost of that machinery is worth understanding before you reach for it.

One of the early surprises a developer encounters when working with relational databases is that the IDs in an auto-increment column are not consecutive. A table with 100 rows has IDs that skip from 47 to 52, then jump to 89, then continue. The gaps look wrong, like data has been deleted, but no deletion has happened. The behavior is by design, and understanding why it's there explains both why most systems shouldn't fight it and why the small number of systems that genuinely need gapless sequences pay a real performance cost for them.

The patterns in this post matter for any system using a relational database for ID generation. We use sequences for the primary keys in DocuMint invoices, CronPing monitors, FlagBit flags, and WebhookVault request logs, and we accept gaps as the right tradeoff in all four cases.

How sequences actually work

A Postgres sequence is a separate object from any table. It holds a current value and increments atomically when called. The integration with auto-increment columns happens through the DEFAULT clause: when you insert a row without specifying the ID column, Postgres calls nextval() on the sequence and uses the returned value as the column value. The sequence call is its own transaction — it commits independently of the surrounding transaction in the same connection.

This separation is the key design decision. If sequences were tied to the surrounding transaction, every transaction that called nextval() would block other transactions calling nextval() until the first one committed. The sequence would be a hot spot serialized across all writers. By committing the sequence advance independently, Postgres lets thousands of transactions in flight each acquire their next ID without waiting on each other. The cost of this design is that a transaction can call nextval() and then roll back, leaving the assigned ID unused. The ID has been consumed by the sequence and won't be reused.

The same dynamic applies to MySQL's AUTO_INCREMENT column attribute, SQL Server's IDENTITY columns, and Oracle's sequences. The mechanisms differ in detail — MySQL's innodb_autoinc_lock_mode controls whether IDs are reserved at insert time or reserved in batches, for example — but the principle is identical: high concurrency requires that ID reservation be decoupled from transaction commit, and that decoupling produces gaps when transactions abort.

The everyday sources of gaps

Transaction rollback is the most common source. Any INSERT statement that's part of a transaction that subsequently aborts produces a gap. The abort can be explicit (the application calls ROLLBACK), implicit (a constraint violation aborts the transaction), or external (the connection drops mid-transaction). All of these consume IDs from the sequence without using them.

Sequence caching is the second source. Postgres allows sequences to be configured with CACHE n, which has the sequence pre-allocate n IDs to a connection in a single trip. The connection then uses those IDs for subsequent inserts without going back to the sequence. If the connection closes before using all of its cached IDs, the unused IDs are lost. The same pattern applies to MySQL's innodb_autoinc_lock_mode=2 and Oracle's CACHE setting.

Server crashes are the third source. When a database crashes after committing a sequence advance but before the next checkpoint, the sequence's on-disk state is restored to the last checkpoint. To prevent the database from reissuing already-used IDs after recovery, Postgres advances the sequence to the next safe value, which may be ahead of the last actually-used ID. The gap from the last used ID to the new safe value is permanent.

Why the gaps are usually fine

For internal database keys whose only purpose is to uniquely identify rows, gaps are irrelevant. The keys serve their purpose whether they're 1, 2, 3, 4 or 1, 2, 5, 47 — they identify rows uniquely and they support index lookups. There's no semantic meaning attached to consecutiveness, and no application logic depends on it.

For surrogate keys exposed to users, the gaps are usually still fine. A user looking at invoice number 1047 and then invoice number 1052 is mildly aware of the gap but doesn't typically extract meaning from it. Most users assume the gap represents normal business operations — perhaps some invoices were canceled, or invoices were issued out of order. The mental model accommodates gaps without requiring explanation.

For audit log keys, gaps are not just acceptable but useful. The presence of a gap in audit IDs while the timestamp range is continuous is a signal that something was deleted from the audit log, which an auditor wants to detect. A gapless sequence would actively hide the deletion.

The cases where gaps are not acceptable

Three categories of system genuinely need gapless sequences. The first is regulated invoice numbering. Several countries — Italy, Mexico, Brazil among others — require that tax invoices be numbered sequentially without gaps so that auditors can verify completeness. Missing an invoice number is a tax compliance issue. The standard pattern is to issue a separate invoice number from the database key, drawn from a gapless sequence that's incremented only after the invoice is committed.

The second is regulated transaction logging in financial systems. Some regulators require sequential ordering of trade records or settlement records, with gaps requiring explicit notation and explanation. The same pattern applies: separate the regulated sequence from the database key.

The third is when the sequence value will be used for cryptographic operations that require unique non-skipping values. This is rare and usually a sign the design should be reconsidered. The right answer is almost always to use UUIDs or content-addressed identifiers for cryptographic purposes and let the database key be whatever the database produces.

The patterns for gapless sequences

The simplest gapless pattern is a counter table. A single row holds the next value. Each operation that needs a value does SELECT FOR UPDATE on the row, reads the current value, increments and writes back, and uses the value. The transaction's lock on the row prevents two concurrent operations from getting the same value. The cost is serialization: every operation that needs a value blocks behind every other operation that needs a value. Throughput is limited to roughly one operation per transaction round-trip, which on a single-instance database is hundreds per second, not thousands.

The optimization is to issue values in batches. A worker reserves a range of, say, 100 values from the counter table in one transaction, then uses those values without further coordination. The cost is that batch boundaries can leave unused values when a worker doesn't use its full batch — which is a gap, defeating the gaplessness goal. So this only works for sequences whose use can be made batch-aware.

The deferred-numbering pattern is often the right answer. Database rows are created with regular auto-increment IDs and a NULL "official_number" column. A separate process — usually a single-threaded worker, or a transactional batch job — assigns official numbers in order from a counter table at the moment of finalization. If a row is canceled before finalization, no official number is consumed. The official number is gapless even though the database keys have gaps.

The practical guidance

For internal keys, accept gaps. The sequence semantics that produce them also produce the throughput that lets the database scale, and the gaps don't matter for the keys' purpose. Don't reset sequences "to clean up" — every reset risks ID collisions if any logic anywhere caches old IDs.

For user-facing keys, accept gaps unless you have a specific reason not to. Most user-facing identifiers benefit from being non-sequential anyway — a gappy sequence prevents customers from inferring volume metrics by counting consecutive IDs they receive. If you want a user-facing identifier that obscures volume, use a separate UUID column or a content-derived hash, not a contortion of the database sequence.

For regulated sequences, use the deferred-numbering pattern. Do not try to make the database's auto-increment column gapless — every workaround will eventually fail, and the failure mode is that you've issued duplicate or out-of-order numbers in production with regulatory consequences.

The deeper observation

The gap-tolerance of database sequences is one of those design decisions whose rationale isn't obvious until you understand the alternative. Every modern database that achieves more than a few hundred writes per second has the same property, because the alternative — locking the sequence across transaction boundaries — was tried in older databases and produces unacceptable concurrency. The lesson is not that gaps are an error or a flaw, but that they're the price the database pays to give us the throughput we depend on. The cost shows up at the surface as small visual oddities (invoice 1047, invoice 1052) and the work for the developer is to understand which surfaces those oddities matter on and which they don't. The wrong answer is to fight the database's design; the right answer is to use the database for what it's good at and add a separate, smaller, slower mechanism only for the narrow cases where gaplessness is genuinely required.

Read more