The Idempotency Token Pattern: Why Your API Needs One Even If Stripe Does Not Force It

Every API that accepts non-idempotent operations eventually has the duplicate-charge incident. A customer's mobile client retries a payment because the request appeared to hang. The original request actually succeeded but the response was lost on the way back. The retry succeeds too. The customer is charged twice. The support ticket is filed. The engineering team realizes that the API has been doing this for months at a low rate, and the only reason it has not blown up sooner is that mobile networks have been more reliable than expected.

Stripe popularized the idempotency-key header as the canonical solution to this problem. The pattern is widely copied because it works: the client generates a unique key per logical operation, sends it as a header on every retry of that operation, and the server records the result of the first attempt and returns the cached response on every subsequent attempt. We use idempotency tokens across DocuMint (invoice generation), CronPing (monitor creation), FlagBit (flag mutation), and WebhookVault (endpoint creation), and the design choices that look small at the start matter at scale.

The minimum viable token mechanism

The smallest implementation that actually works is a table with three columns: the idempotency key (a string), the response body (TEXT or JSONB), and the response status (integer). The table needs a unique constraint on the idempotency key, scoped per account. The endpoint handler does the following on every request that includes an idempotency key:

  1. Attempt to insert a row with the key, marking it as in-progress.
  2. If the insert succeeds, this is the first attempt. Execute the operation, write the response back to the row, return the response.
  3. If the insert fails with a unique-constraint violation, this is a retry. Read the existing row. If it is complete, return the stored response. If it is still in-progress, return a 409 with a hint to retry later.

The unique-constraint-violation-as-signal pattern is the right primitive because it is atomic: there is no race condition between the check and the insert. Two concurrent retries of the same key will produce exactly one successful insert and one violation, regardless of timing.

The scoping question

The idempotency key must be scoped per account, not globally. Two different customers might independently generate the same UUID; both should be allowed to use it without one client's request being mistakenly treated as a retry of the other's. The scope can be tighter than per-account if the API has finer-grained concepts: per-API-key, per-project, per-team. The right scope is the smallest scope within which a single client could reasonably generate two requests with the same key.

The scope must not be coarser than per-account, even on read-mostly endpoints. A multi-tenant SaaS that scopes idempotency keys globally is a single bug away from one customer's retry returning another customer's data, which is the kind of cross-tenant leak that destroys trust permanently.

The response-fidelity requirement

The stored response must include the status code, not just the body. Customers retry idempotency requests in cases other than network failure: a 503 response from an overloaded backend, a 429 from rate limiting, a 4xx validation failure. If the stored response is just the body, a retry of a previously-4xx request will return 200 with the error body, and the client's error handling will misinterpret it as success.

The Stripe convention is to cache all non-5xx responses. 5xx responses are not cached because the server did not finish processing the request, so a retry might genuinely produce a different result if the underlying issue has resolved. 4xx responses are cached because the same input will produce the same validation error regardless of when it is processed.

The cache should also include the response headers, particularly any rate-limit headers and any custom headers that the client might be expecting. Rate-limit headers are particularly important: the client uses them to schedule subsequent requests, and a retry that returns stale rate-limit values can cause the client to back off when it does not need to or to retry too aggressively.

The concurrent-retry case

The 409 response for in-progress retries is the part that gets skipped most often, and skipping it produces the most subtle bugs. If two retries arrive simultaneously and both find no existing row, both try to insert, both succeed (with different sub-keys or because the implementation skipped the unique constraint), and both execute the operation. The customer is double-charged. The idempotency mechanism that was supposed to prevent the double charge has instead created a more confusing version of the same bug.

The right pattern is: insert the row with a status of "in_progress" before executing the operation. If the insert succeeds, proceed. If it fails with a unique-constraint violation, check the existing row. If it is still in-progress, return 409 with Retry-After. If it is complete, return the stored response. The in-progress lock prevents concurrent execution; the 409 response gives the client a clean way to back off and retry.

The timeout on in-progress entries is the next consideration. If the server crashes mid-request, the in-progress row will persist until something cleans it up. A reasonable default: in-progress entries older than 60 seconds are considered abandoned, and the next retry takes over. The application can either delete the abandoned row and try again, or treat the request as fresh. The choice depends on whether the underlying operation is safe to retry; if it is, treat it as fresh.

The expiry question

Idempotency entries should not be kept forever. The retention period is a trade-off between guard window (longer is safer) and storage cost (shorter is cheaper). Stripe uses 24 hours, which is enough to cover essentially all client retry scenarios and short enough to keep the table small. We use 24 hours across all four products.

The expiry mechanism should be partition-drop, not row-delete. Partitioning the idempotency table by day and dropping old partitions is much faster than running DELETE queries against billions of rows. The trade-off is slightly more complex schema management; the gain is that cleanup is constant-time regardless of table size.

The external-side-effect problem

The hardest case for idempotency is when the operation has external side effects. A DocuMint request that generates a PDF and stores it in S3 cannot be made idempotent purely by caching the response: the S3 storage operation is not idempotent without explicit support. The pattern is to pass the idempotency key down to the storage operation as the object key (or as part of the object key), so that retries of the same logical operation produce the same storage state.

This pattern composes if every external dependency supports idempotency keys natively. Stripe supports them. S3 supports them via versioning and object keys. SQS supports them via deduplication IDs. Most modern APIs support them. The pattern breaks when a dependency does not, in which case the application has to either build its own check-then-act layer (with the same race condition the idempotency mechanism was supposed to prevent) or accept that some operations will be retried at the boundary.

The client-generated-key requirement

The idempotency key must be generated by the client, not by the server. A server-generated key returned in a response is useless because the client cannot use it on a retry: the client only sees the key if the response actually arrives, but the case we are guarding against is the response not arriving.

The recommended format is UUIDv4 or any other random 128-bit identifier. The key only needs to be unique within the scoping window (account plus 24 hours), which UUIDv4 trivially satisfies. The key should be opaque from the server's perspective; the server should not parse it or derive meaning from it.

The client generates the key once per logical operation and uses it for every retry of that operation. The client's retry logic has to maintain the key across retries, which means it has to be stored somewhere persistent for the duration of the retry window. Mobile clients with local SQLite databases handle this well; backend-to-backend integrations need a small persistent retry queue.

What this is and is not

The honest summary: the idempotency token pattern is small surface, low operational cost, and eliminates a category of customer-facing bugs that would otherwise be expensive to investigate and embarrassing to admit. The cost is one table per account, a few microseconds per request, and the discipline to design the operation so that the same idempotency key always produces the same result.

It is not a substitute for actually-idempotent operations where they are possible. A GET request is idempotent by HTTP definition. A PUT request that overwrites a resource is idempotent because retrying produces the same final state. A DELETE that removes a resource is idempotent because deleting twice is a no-op. The idempotency token pattern is the right answer for the operations that are not naturally idempotent: POST requests that create resources, payment operations, anything that produces an externally-visible side effect that you do not want repeated.

The pattern has been independently invented and re-invented by every team that operates an API at scale. The version that has converged across Stripe, Square, Plaid, Twilio, and most modern fintech APIs is the version described above: client-generated keys, account-scoped uniqueness, response caching including status, 409 for in-progress conflicts, 24-hour retention. There are local variations, but the core has stabilized enough that a client written against any one of them can be adapted to any other with hours of work, not weeks. That kind of convergence is uncommon in API design, and it is worth respecting when designing a new API.

Read more