Designing Stable Public Identifiers: When the Primary Key Is Not the Right Customer-Facing ID
Auto-increment primary keys leak business information, expose enumeration attacks, and constrain migration paths. Replacing them at the API boundary with a separate public identifier is one of the schema decisions that pays off most over time. The patterns that work and the ones that don't.
The primary key is the database's natural identifier for a row. The public identifier is the one that appears in URLs, in API responses, on invoices, in webhook payloads, in customer support tickets. Treating these as the same thing is a default that mostly works at small scale and hurts in proportion to the system's age. We've seen the cost across DocuMint, CronPing, FlagBit, and WebhookVault, and the pattern is consistent: the second identifier earns its weight, but the time to add it is during the design phase, not after the first customer integration is live.
What auto-increment IDs leak
An auto-increment integer primary key encodes information you usually don't want to expose. The number 47 in a URL means there are at least 47 of these things; competitors enumerating IDs can estimate your customer count, your throughput, and the rate at which the system is growing. A jump in IDs between two of your customer's records reveals how many other customers used the system in between. None of this is private information in the strict sense, but most companies don't intend to publish it.
Auto-increment IDs also enable enumeration attacks. If /api/v1/invoices/47 is a valid URL, then so is /api/v1/invoices/48 through /api/v1/invoices/N. Every endpoint that doesn't enforce per-tenant authorization rigorously has a vulnerability. Replacing the integer with a random unguessable identifier doesn't substitute for proper authorization, but it raises the bar for casual enumeration and limits the blast radius of a missing check.
What public identifiers should look like
The public ID should be a string with a recognizable prefix and an unguessable suffix. Stripe's pattern — cus_ for customers, inv_ for invoices, sub_ for subscriptions — is the convention most developers find familiar. The prefix makes the ID self-describing in logs and error messages; the suffix is some encoding of randomness or time-ordered randomness.
For DocuMint we use doc_ prefixed IDs for invoices, tmpl_ for templates, key_ for API keys. For CronPing we use mon_ for monitors, ping_ for individual pings, alert_ for alert configurations. The prefix is two-to-four characters long; the suffix is base62 encoding of either UUID v4 or a ULID, depending on whether time-ordering matters for the type. Keys benefit from time ordering because chronological lookup is common; invoices don't, because they're indexed by other things.
The length matters. Stripe uses 24-character suffixes, which is enough randomness to make collision impossible at any scale we're likely to see. Shorter suffixes are easier to read and copy but provide less collision margin. We've settled on 16-character base62 suffixes (about 95 bits of entropy) as the right balance for our scale.
The schema
The internal primary key stays as a 64-bit integer or UUID. The public ID is a separate column with a unique index. Application code looks up by public ID at the API boundary and uses the integer key for joins, foreign keys, and internal references. The conversion happens once per request, at the edge.
CREATE TABLE invoices (
id BIGSERIAL PRIMARY KEY,
public_id TEXT UNIQUE NOT NULL,
tenant_id BIGINT NOT NULL REFERENCES tenants(id),
-- ... other columns
CHECK (public_id ~ '^doc_[A-Za-z0-9]{16}$')
);
CREATE INDEX idx_invoices_tenant ON invoices(tenant_id, created_at DESC);The CHECK constraint is optional but useful — it catches application bugs that produce malformed IDs. The unique index on public_id is required because the API does point lookups on it.
The migration story
Adding public IDs to a system that doesn't have them is easier than most schema migrations, because the public ID is additive and old integer-based URLs can keep working during the transition. The four-step pattern: add the public_id column nullable, generate IDs for existing rows in a backfill script, add the unique constraint and NOT NULL, and update API endpoints to accept and return public IDs.
Old URLs with integer IDs can either keep working (with the integer as a deprecated alternate identifier) or redirect to the new URL. We've kept old URLs working for a year before formally deprecating them, with deprecation warnings in API responses and email notifications to integrators. The cost of keeping them working is small; the customer goodwill is large.
The webhook complication
Webhook payloads should always use public IDs. Customers building integrations against webhook events expect stable identifiers that they can store and look up later. Sending integer primary keys in webhooks creates a permanent dependency on internal database structure that you can't break without breaking customer integrations.
This is where we've seen the biggest payoff for designing public IDs early. WebhookVault has been delivering events with public IDs from the start, which means we've never had to migrate customer integrations through an ID change. Stripe's API has been delivering events with public IDs for over a decade, which is part of why their integration story remains unusually clean.
The patterns that fail
Three patterns we've seen and avoided:
Hashing the integer ID to produce a public ID. This is reversible if the hash function is known, doesn't add real entropy, and creates a tight coupling between the two ID spaces. If you ever need to reissue an integer ID (for example, during a database migration), the public ID changes, breaking customer integrations.
Using the integer ID directly in URLs and asserting that authorization is sufficient. Authorization checks are the right defense in depth, but they're not a substitute for unguessable identifiers. A single missing check on a single endpoint exposes everything. With unguessable IDs, the same missing check still has to be fixed, but the blast radius during the window before the fix is far smaller.
Using sequential IDs with a high random offset. The IDs look random because they're large numbers, but they're still sequential at scale and still leak the same business information. The first time someone notices, the obfuscation is gone.
The deeper observation
The case for separate public identifiers is similar to the case for separate audit logs and separate config tables: it's adding something the system doesn't strictly need at small scale, in exchange for not having to retrofit it at large scale. The retrofit is much harder than the addition. Customer integrations stored against integer IDs become millions of references that have to be migrated; webhook consumers built against integer IDs become integration code at every customer that has to be updated. The same decisions that look like over-engineering at year one look like saved years of work at year five. Public identifiers are one of the cheapest of those decisions, and one of the easiest to skip.