Designing API Idempotency for Bulk Operations: Per-Item vs Per-Batch Tradeoffs

Single-request idempotency is a solved problem: the client sends a unique key, the server records the response keyed by that key, and retries return the cached response without re-executing the side effect. Bulk operations break this model in a non-obvious way. A batch of 1000 items is not a single side effect; it is 1000 side effects bundled into a single request. When the request times out partway through, the client retries, and the question is what should happen to items 1 through 500 that already succeeded.

The two patterns that B2B SaaS APIs use are per-batch idempotency and per-item idempotency, and the choice between them is more consequential than it first appears. The wrong choice produces APIs that customers eventually work around with brittle client-side bookkeeping, and the right choice is one of the differentiators between APIs that customers trust at scale and APIs they tolerate.

The two patterns defined

Per-batch idempotency treats the entire bulk request as the unit of deduplication. The client supplies one idempotency key for the batch; the server records the batch outcome (success, partial-success with per-item results, or failure) keyed by that key. On retry with the same key, the server returns the recorded outcome without re-executing any items. The pattern is simple to implement and easy to reason about, but it has a serious failure mode: if the batch partially succeeded and then errored, the client cannot resume from the failure point without changing the key, which makes the retry create new side effects for items that already succeeded.

Per-item idempotency treats each item in the batch as the unit of deduplication. The client supplies an idempotency key per item (or the server derives one from item content); the server records per-item outcomes. On retry, the server skips items already recorded as successful and processes items not yet recorded. The pattern is more complex to implement but supports correct resumption from partial failure without client coordination. The cost is that the idempotency record table grows proportional to total item volume rather than total batch volume.

When per-batch is correct

Per-batch idempotency is correct when the batch is atomic at the application level: either all items succeed or all items fail, with no partial-success state. The pattern fits well for batches that map to a single database transaction with all-or-nothing semantics, where the server commits or rolls back the entire batch and the client never sees partial results.

The pattern is also correct when the batch is small enough that retrying the entire batch is acceptable. A batch of 10 items where retry-the-entire-batch costs nothing is fine with per-batch idempotency even without strict atomicity, because the cost of redoing the successful items is low. The threshold depends on the per-item cost; for cheap items, batches up to 100 are reasonable with per-batch idempotency, and for expensive items, the threshold drops to under 10.

The pattern is incorrect when the batch is large and partial-success is the normal mode of failure. A batch of 1000 invoice generations where items 1-500 succeed and item 501 fails leaves the client with two bad options: change the idempotency key and retry, creating duplicate invoices for items 1-500, or accept the partial success and manually retry items 501-1000, requiring per-item state tracking on the client side. Neither option is good, and both produce customer-support tickets.

When per-item is correct

Per-item idempotency is correct when partial success is normal and resumability is valued. Most bulk operations in real B2B SaaS APIs fit this case: bulk import of records, batch send of webhooks, batch generation of PDFs, batch processing of events. In all of these, the natural failure mode is partial-success-then-error, and the natural client retry pattern is resume-from-the-failure-point.

The implementation requires per-item idempotency keys. The keys can be client-supplied (the client generates a UUID per item and includes it in the request) or server-derived (the server computes a stable hash of the item contents or a content-addressable key from a logical identifier). Client-supplied is simpler and more general; server-derived is appropriate when the item naturally has a stable identity (an invoice for order #12345 generated three times should produce one invoice, with the order ID as the natural idempotency key).

The schema for per-item idempotency

The minimum viable schema requires three tables: a batch-request table recording the overall request shape, a batch-item table recording per-item state, and an idempotency table mapping per-item keys to results. The batch-request table is mostly for observability; the work is in the batch-item and idempotency tables.

The idempotency table schema: CREATE TABLE bulk_idempotency (account_id BIGINT NOT NULL, item_key TEXT NOT NULL, request_id UUID NOT NULL, status TEXT NOT NULL, response_body JSONB, created_at TIMESTAMPTZ DEFAULT now(), PRIMARY KEY (account_id, item_key)). The composite primary key prevents cross-account leaks. The unique constraint on (account_id, item_key) is the atomic primitive that makes the pattern work; INSERT ON CONFLICT DO NOTHING is the lock-free way to claim ownership of an item.

The per-item processing loop: for each item in the request, attempt to insert the idempotency row with status='in_progress'. If the insert succeeds, process the item, update the row with status='completed' and the response. If the insert fails (the row already exists), read the existing row; if status='completed', return the cached response; if status='in_progress', return a 409 Conflict because another concurrent retry is still processing. The pattern handles concurrent retries safely by serializing on the unique constraint.

The response shape question

Bulk endpoint responses need to communicate per-item outcomes, not just the overall batch outcome. The standard pattern is an array of result objects, one per item, with status and either a response body or an error. The HTTP status code reflects the overall batch outcome: 200 if all items succeeded, 207 (Multi-Status) for partial success, 4xx for client-level errors that prevented any items from being processed, 5xx for server errors.

The per-item result object should include the idempotency key, the status (succeeded, failed, skipped-as-duplicate), the response body or error, and optionally the item position in the request array. The skipped-as-duplicate status is the explicit signal to the client that an item was deduplicated from a previous request rather than re-executed; clients can use this to verify that retries are doing what is expected.

The cleanup question

Idempotency records have a retention cost. Per-batch idempotency at typical batch volumes produces modest record counts; per-item idempotency at high item volumes produces large counts. The standard pattern is a TTL on the idempotency table with periodic cleanup, typically 24 hours, with the rationale that retries beyond that window are not common and forcing the client to use a new key after 24 hours is an acceptable contract.

The cleanup implementation should be partition-drop or chunked-delete rather than naive DELETE. A daily partition is the simplest pattern: each day's idempotency records live in their own partition, and the previous day's partition is dropped at midnight. The pattern keeps cleanup cost low and avoids the index bloat that accumulates with bulk DELETE.

What to do about quotas and rate limits

Bulk endpoints need to interact correctly with usage limits. The wrong implementation counts the bulk request as a single API call against rate limits, which lets customers consume 1000 items of quota with one call; the right implementation counts items against quotas and requests against rate limits, with separate accounting. The headers should reflect both: X-RateLimit-Remaining for request count, X-Quota-Items-Remaining for item count.

The interaction with per-item idempotency is subtle: skipped duplicates should not count against item quotas, since the customer is not consuming new work. The implementation requires the quota deduction to happen after item processing, not at request entry, and to deduct only for items that were actually executed rather than for items skipped due to deduplication.

Across our four products

We use per-item idempotency for bulk endpoints in DocuMint batch invoice generation, WebhookVault bulk endpoint creation, and FlagBit bulk flag rules import. The implementation cost was higher than per-batch would have been, but the customer-experience cost of partial-failure recovery was the deciding factor. CronPing bulk monitor creation also uses per-item, with client-supplied idempotency keys per monitor since natural identity is not always available.

The deeper observation is that bulk endpoints are not just scaled-up single endpoints. They have failure modes single endpoints do not have, and the question of how to handle partial failure is the central design decision. APIs that get this right are pleasant to integrate with at scale; APIs that get it wrong force customers to build their own item-tracking infrastructure on top, which becomes a maintenance burden for both sides.

Read more