API Idempotency Keys: Patterns That Survive Real Concurrency

Idempotency keys let clients retry safely without producing duplicate side effects. The basic mechanism is straightforward, but the production patterns — handling in-flight duplicates, response caching, key scoping, expiry — have failure modes that aren't obvious until you've debugged them.

Idempotency keys are one of those API design choices that look optional until something goes wrong. A client sends a payment request. The network drops. The client retries. Did the first request succeed? The client doesn't know. Without idempotency keys, the safe retry behavior is to do nothing and prompt the user, which is a bad user experience. With idempotency keys, the client can retry freely and the server guarantees it won't double-process the request.

The basic pattern is simple enough that most teams implement it correctly the first time. The harder patterns — handling concurrent retries before the original request finishes, choosing what scope a key applies to, deciding when to expire keys, returning consistent responses — are where production bugs hide. We've implemented idempotency across DocuMint, CronPing, FlagBit, and WebhookVault, and the patterns below have survived everyday production use.

The minimum viable mechanism

The client sends a request with an Idempotency-Key header containing a unique value (typically a UUID generated client-side). The server processes the request, stores the response keyed by (idempotency_key, account_id), and returns the response. If a subsequent request arrives with the same idempotency key, the server returns the stored response without reprocessing.

The data structure is a table with three columns at minimum: idempotency_key, account_id, and response_payload (typically the full HTTP response body and status code). A unique index on (idempotency_key, account_id) handles deduplication. The response payload is stored after the request completes successfully, and the same payload is returned on every subsequent retry.

This handles the easy case: a single client retrying after a network blip, with several seconds between retries. The retry comes in after the original request has finished writing its response, finds the stored response, and returns it.

The harder case: concurrent retries

The case that breaks the naive implementation is concurrent retries. The client's HTTP library doesn't always wait for a response before retrying — some libraries retry on TCP-level signals or after timeouts that fire while the server is still processing the original request. The result is two simultaneous requests with the same idempotency key.

The naive flow goes: request A arrives, looks up the idempotency key, finds nothing, starts processing. Request B arrives, looks up the same idempotency key, finds nothing (because A hasn't finished), starts processing. Both requests succeed independently and the side effect happens twice — exactly what idempotency keys are supposed to prevent.

The fix is a row-level lock at request start. The first request inserts a row with the idempotency key and a status of "in_progress" inside a transaction that commits before processing begins. The second request, on lookup, finds the in_progress row and either waits for it to complete (returning the eventual response) or returns a 409 Conflict immediately with a "request in progress" body. Both behaviors are reasonable; the choice depends on whether clients can handle the wait.

Stripe's API documents the 409 behavior: a request with an in-progress idempotency key returns immediately with an error indicating the request is being processed. The client retries with backoff and eventually gets the stored response. This avoids holding HTTP connections open for long periods and avoids the cascading-timeout problem when the original request itself is slow.

What scope does the key apply to

An idempotency key uniquely identifies a request, but "the same request" needs definition. Most implementations scope keys to (idempotency_key, account_id). Two different accounts can use the same idempotency key value without collision; within an account, the same key means the same request.

The trickier question is what happens when the same key is used for two requests with different bodies. Stripe's behavior is to return the stored response from the first request and ignore the new body. This is the safest default — the client is treating the key as identifying a logical operation, and if the body changes, that's almost certainly a bug.

Some implementations validate that the request body matches the stored body and return an error if not. This catches client bugs but creates a false sense of safety: the client could just retry with a fresh key and produce a duplicate side effect, defeating the purpose. We default to Stripe's behavior and document it clearly.

What gets cached and what doesn't

The stored response should be the full HTTP response: status code, headers (or at least the application-meaningful ones), and body. On retry, the server reconstructs the same response.

The wrong implementation stores only the body and reconstructs the status code from a hardcoded 200, which produces incorrect responses for requests that originally returned 4xx. The cached response should faithfully reproduce what the client originally got, including errors. A request that originally failed with 422 (validation error) should return the same 422 on retry, not get reprocessed and potentially succeed because the underlying state has changed.

Server errors (5xx) are a special case. If the original request hit a 500, the cached response would force every retry to also see a 500, even after the underlying issue is fixed. Most implementations don't cache 5xx responses; the request is retried fresh.

When do keys expire

Idempotency keys can't be kept forever — the table would grow without bound. The standard expiry window is 24 hours, which is long enough to cover any reasonable client retry pattern and short enough to keep the table small.

The expiry is a hard window, not a sliding one. A key created at noon on Monday expires at noon on Tuesday regardless of when it was last accessed. This avoids the case where a long-lived key persists indefinitely because of repeated retries.

The cleanup job that removes expired keys runs as a periodic background task. If the table grows quickly, the cleanup needs to run more frequently than once a day. Partitioning the idempotency table by day (PostgreSQL declarative partitioning, dropping the oldest partition daily) is the cleanest way to handle this at scale; for small services, a chunked DELETE WHERE created_at < NOW() - INTERVAL '24 hours' works fine.

What about non-idempotent operations

Some operations don't have a natural idempotent retry. Sending an email, charging a card, posting a webhook — repeating these is inherently visible. Idempotency keys handle these correctly because the dedup happens at request entry, before any side effect occurs.

The pattern that breaks is when the side effect happens outside the database transaction. If processing the request involves calling an external API that has its own retry semantics, and the external call partially succeeds before the request fails, the idempotency key correctly returns the same response on retry — but the partial side effect at the external API has already happened, and the original request can't undo it.

The mitigation is to make the external call itself idempotent — pass an idempotency key to the external API, or use a deterministic operation ID that the external API can dedup on. Stripe-to-Stripe operations always pass idempotency keys for this reason.

Implementation across the four products

DocuMint uses idempotency keys on its /invoice endpoint to prevent duplicate PDFs from being generated and counted against quota during retries. CronPing uses them on the monitor-create endpoint to prevent duplicate monitors from network-blip retries. FlagBit uses them on flag-create endpoints. WebhookVault uses them on the endpoint-create and request-replay endpoints. The implementation is shared via a small library: a decorator that handles the lookup, the in-progress lock, and the response caching, leaving the endpoint code unchanged.

The deeper observation

Idempotency keys are one of the small set of API design choices that pay back disproportionately. The implementation is one table and a few hundred lines of middleware. The behavior they enable — clients can retry freely, network failures don't produce duplicate side effects, distributed systems don't need exactly-once delivery from the network — solves a class of problems that would otherwise require much more complex solutions. The cost is small enough that they belong in any API that mutates state on the server, and the alternative — leaving clients to figure out retry safety on their own — produces a category of subtle bugs that rarely surface in development and become routine in production.

Read more