Designing Bulk APIs: Batch Operations That Don't Drown Your Database
Bulk endpoints look simple from the outside and turn into operational landmines from the inside. The honest design space involves transactions, partial failures, idempotency, and quota accounting in ways the marketing copy never mentions.
Customers ask for bulk endpoints constantly. They want to send a thousand invoices in one call, register five hundred webhooks in one shot, evaluate a feature flag for ten thousand users in a single request. From the API consumer's perspective the request is obvious — one network round trip beats a thousand. From the API provider's perspective the request opens a design space full of operational landmines that the simple-looking endpoint conceals.
The mistake we see most often in early-stage SaaS APIs is exposing a bulk endpoint that internally just loops over the single-item handler. This works fine in development with three-item batches and tears your database apart in production with thousand-item batches. The right design starts from the question: what does partial failure mean for this operation, and how does the caller find out about it?
Three honest options for transaction semantics
The first design decision is whether the batch is atomic. There are three honest options and your API needs to pick one and document it loudly.
All-or-nothing wraps the entire batch in a database transaction. If any item fails, the whole batch rolls back. This is what callers usually want when they say "atomic" — they want either every invoice created or none of them. It is also operationally dangerous because a single item with a constraint violation kills a thousand-item batch, the transaction can hold locks for seconds, and the failure mode at scale is "deploy a slightly bad client and our database lock waits explode."
Best-effort with error reporting processes each item independently and returns a structured response listing which succeeded and which failed and why. This is what most public APIs actually do. It is harder for callers to use correctly but operationally much safer. The response shape matters here — return the items in the same order as the request, with both per-item status and a summary count, and make sure error reasons are structured enough to be machine-handled.
Two-phase validates the entire batch first, returns errors for any invalid items, then processes only the valid ones. This combines the upsides of the other two but doubles the request handling cost and produces awkward semantics when validation passes but processing fails (which it always eventually does).
For DocuMint we use best-effort. A bulk invoice generation request returns a JSON object with a results array, each entry having a status, an invoice_id if successful, and a structured error if not. We do not roll back successful invoices because of one bad input. We document this loudly because it is exactly the kind of decision that surprises careful callers if they assume defaults from other APIs.
Idempotency for bulk operations
Single-item idempotency is well-understood: the caller passes an Idempotency-Key header, the server stores the response keyed on that header, replays of the same key return the cached response. Bulk operations expose a harder question: what is the unit of idempotency?
The two reasonable answers are batch-level and item-level. Batch-level idempotency means the entire request body has one key, and a replay returns the cached response for the whole batch. Item-level idempotency means each item carries its own key, and the batch response is computed item by item with cache lookups happening per item.
Batch-level is operationally simpler and has worse retry behavior. If a batch of a thousand items succeeds for nine hundred and times out before the response reaches the client, the client retry will get a fresh batch attempt that re-creates the nine hundred successful items if not de-duped at the item level. Item-level is operationally more expensive — you do a cache lookup per item — and gives correct retry behavior because each item is independently de-duplicated regardless of how the batch boundaries shifted across attempts.
WebhookVault uses item-level idempotency on its bulk replay endpoint precisely because the failure mode of duplicate replays is annoying enough that we are willing to pay the per-item cache cost. The key is constructed from the customer ID plus a caller-supplied event ID, and the response carries through the cached delivery status if the item was already processed.
Quotas, rate limits, and the cost question
Single-item rate limits are usually expressed as requests per minute. Bulk operations break this model: a single request might process a thousand items, and counting it as one request against a sixty-per-minute limit understates the actual load by three orders of magnitude.
The honest fix is to count bulk operations against an item-based quota, not a request-based one. CronPing's bulk monitor creation counts each monitor against the customer's monthly monitor quota, regardless of whether they were created singly or in a batch of fifty. The bulk endpoint is a convenience, not a way to bypass the underlying quota.
Rate limiting is trickier because the caller wants to know in advance whether a batch will succeed. The pattern that works is to expose the per-item rate limit explicitly in the response — X-RateLimit-Items-Remaining alongside the standard headers — and to fail-fast at the item level once the limit is hit, returning errors for the unprocessed items rather than silently dropping them.
Backpressure and batch size limits
Even with all the above, an unbounded bulk endpoint is a denial-of-service waiting to happen. Set a maximum batch size that you have actually load-tested. For API endpoints with synchronous processing, sub-second response time at the size limit should be the gate. For endpoints that genuinely need to handle larger batches, accept the request, return a 202 with a job ID, and provide a status endpoint for polling. The synchronous path should top out at one or two hundred items in our experience; larger batches should be async by construction.
The 202-with-job-ID pattern also gives you a place to expose progress, partial results during processing, and a cancellation endpoint. FlagBit's bulk targeting rule import works this way — you POST a CSV of up to ten thousand user IDs, get back a job ID, and poll for completion with a progress percentage and per-row error reporting.
The tests that actually catch the bugs
Bulk endpoint bugs hide in the failure modes that are hard to reproduce manually. The tests that find them are: (1) a batch where every item fails — does the response shape stay sane? (2) a batch where the first item fails and the rest succeed — do you correctly report partial success? (3) a batch where every item is identical with the same idempotency key — do you return the cached result for all of them or process them all? (4) a batch large enough to cross your size limit — do you reject it with a useful error or process the first N and silently truncate? (5) a concurrent retry while the original is still processing — do you de-duplicate or double-process?
Each of these is a one-off test that takes ten minutes to write and saves a real production incident. The bugs they catch are the ones that would otherwise reach customer support tickets in a form that takes hours to diagnose.
Where to learn more
Bulk operations sit at the intersection of database design, API design, and operational reliability. Each of our four products treats bulk operations differently because their underlying domains have different correctness requirements. DocuMint processes invoice batches with best-effort semantics. CronPing uses item-level idempotency for bulk monitor creation. FlagBit uses async jobs for large targeting rule imports. WebhookVault uses item-level idempotency for bulk replay. The pattern that works is to pick the semantics that match your domain, document them loudly, and resist the temptation to make all four decisions identically just because the endpoints look similar from the outside.