Database ID Generation: Auto-Increment, UUIDs, ULIDs, and Snowflake IDs
The choice of primary-key strategy is one of the earliest schema decisions and one of the hardest to reverse. Auto-increment integers, UUIDs, ULIDs, and Snowflake-style IDs each have specific structural costs and benefits that compound across the lifetime of the system.
Every table needs a primary key, and the choice of how to generate it gets made early — often in the first migration of a project. The decision is structural in a way that few other schema decisions are: changing a primary-key strategy after the table has data and after foreign keys reference the column is a multi-week migration project that touches every reference and every index that depends on the column. The strategy you pick on day one is, for practical purposes, the strategy you live with for years.
This post covers the four strategies that account for almost all production use — auto-increment integers, UUIDs, ULIDs, and Snowflake-style IDs — and the structural trade-offs between them. The patterns apply across the four products in our studio — DocuMint, CronPing, FlagBit, and WebhookVault — though we use a mix of strategies depending on what each table represents.
Auto-increment integers
The default in most introductions to SQL is the auto-incrementing integer. PostgreSQL's BIGSERIAL, MySQL's AUTO_INCREMENT, SQLite's INTEGER PRIMARY KEY: all give you a 64-bit integer that the database hands out monotonically. The strategy is the cheapest in storage (8 bytes) and the cheapest in index cost (B-tree inserts at the right edge of the index, no fragmentation, excellent cache behavior because recent rows cluster). The performance properties are hard to beat.
The cost is in three structural places. First, the IDs leak business information: an attacker who sees order #4791 today and order #4823 tomorrow knows you process around 32 orders per day. For consumer products this can be acceptable; for SaaS where competitors might be observing your sign-up rate, it is a leak you may not want to make. Second, the IDs are not generated client-side: a client cannot mint an ID before the round-trip to the database, which complicates patterns like idempotency keys and offline-first applications. Third, in a sharded or distributed-write topology, coordinating a single global counter is a synchronization point that becomes a bottleneck at scale.
Auto-increment is the right answer when the table is small in scope, written from a single application instance, and the IDs do not need to be exposed to clients. Internal join tables, configuration tables, audit logs that are referenced only by the application — these are good fits. We use auto-increment for almost all internal-only tables in our four products.
UUIDs (v4 random)
UUIDs solve the structural problems of auto-increment by being globally unique without coordination. A UUID v4 is 128 bits of cryptographically random data formatted as a 36-character string. Anyone, anywhere, can mint a UUID without coordinating with anyone else and the probability of a collision is negligible. UUIDs do not leak information about ordering or volume; they can be generated client-side; they work in a sharded topology without a coordination point.
The costs are real and easy to underestimate. A UUID is 16 bytes vs 8 for a BIGINT, and the random distribution means that B-tree inserts happen at random positions in the index, causing page splits and fragmentation. The index itself is twice the size, and the cache hit rate is worse because consecutive rows do not cluster. On a write-heavy workload, the difference between auto-increment and random UUID primary keys can be 2-4x in throughput once the index no longer fits in RAM.
The standard mitigations are PostgreSQL's UUID type (which stores the 16 bytes binary rather than the 36-char string) and avoiding the temptation to also index a string representation. The deeper mitigation, if you need UUID-shaped IDs, is to use a sortable variant — which is what ULIDs and time-ordered UUIDs solve.
ULIDs and time-ordered UUIDs
The ULID specification (Universally Unique Lexicographically Sortable Identifier) and the newer UUID v7 standard both address the random-insertion problem by encoding a timestamp in the high bits of the ID. The format is 48 bits of millisecond timestamp followed by 80 bits of randomness for ULIDs, or 48 bits of timestamp followed by 74 bits of randomness for UUID v7. The result is an ID that is globally unique without coordination, sortable by creation time, and clusters well in a B-tree index.
The structural benefits are substantial. Index inserts happen at the right edge of the B-tree most of the time, recovering the cache and fragmentation behavior of auto-increment. Sorting by ID gives you sort-by-creation-time for free. Range queries by time can use the primary key index instead of a separate created_at index. Cursor pagination works directly on the primary key without needing a composite cursor.
The trade-off relative to UUID v4 is a small information leak: the timestamp is recoverable from the ID, so an attacker who sees an ID can infer when the row was created. For most applications this is acceptable — the timestamp is usually exposed in API responses anyway — but for cases where creation time is sensitive, UUID v4 remains the right choice. We use ULIDs for any externally-exposed primary key in our products: API keys, webhook IDs, monitor tokens, feature flag rule IDs.
Snowflake-style IDs
Twitter's original Snowflake design and its many descendants (Discord, Instagram, Sony PlayStation Network) pack a 64-bit integer with three fields: a 41-bit timestamp, a 10-bit machine ID, and a 12-bit per-machine sequence number. The result fits in a BIGINT (8 bytes, same as auto-increment), generates without database coordination, sorts roughly by creation time, and avoids the cache-hostile randomness of UUID v4.
The benefits relative to ULIDs are storage size (8 bytes vs 16) and the ability to use the ID directly in arithmetic contexts that expect integers. The cost is the operational complexity of assigning machine IDs across a deployment: each ID-generating process must have a unique 10-bit identifier, which requires either a coordination service or a careful deployment discipline. In practice this means Snowflake IDs make sense at scale where the storage win matters, and ULIDs make more sense at smaller scale where the coordination cost does not earn its weight.
The 41-bit timestamp gives roughly 70 years from a chosen epoch. Most Snowflake implementations choose an epoch close to their first deployment, which is fine until the system outlives the original team and someone has to find the magic number that defines what year is encoded by timestamp 0.
The honest decision tree
For tables that are internal-only, written from a single application, and never exposed externally: auto-increment integers. The performance and storage characteristics are hard to beat and the ID leak is irrelevant when the IDs are not exposed.
For tables whose primary keys appear in URLs or API responses, are generated by clients, or come from multiple writers: ULIDs or UUID v7. The sortable property recovers the index behavior of auto-increment without the coordination cost.
For workloads where storage and arithmetic matter and you have the operational discipline to assign machine IDs: Snowflake IDs.
For workloads where any ordering information leak is unacceptable: UUID v4, with the awareness that you are paying real performance cost for the property.
The deeper observation
The choice of primary-key strategy is one of the earliest schema decisions and one of the hardest to reverse. Most teams default to whatever their framework's tutorial example used, which is usually auto-increment integers. That default is right for many cases and wrong for others, and the cost of the wrong default does not show up until the system is large enough that migrating is expensive. Knowing the four strategies and their structural trade-offs before the first migration is one of the higher-leverage pieces of database knowledge a backend engineer can have.