Designing API Cursor-Based Bulk Operations: How to Process Millions of Rows Without Timing Out
Bulk API operations on millions of rows need different patterns than the small-batch endpoints most APIs ship. Cursor-based bulk operations let customers process unbounded result sets through repeatable, resumable, idempotent batches without timing out or losing track of progress.
Bulk operations on small datasets are easy. A customer has a hundred rows to update, sends a single POST with the array of items, gets back a response with per-item statuses, done. The patterns we covered in cycle 188 on bulk import endpoints handle this case well: per-item idempotency keys, atomic per-batch transactions, response shapes that report partial success.
Bulk operations on millions of rows are a different problem. A customer has 5 million records to import, or 50 million to update, or 200 million to delete based on a filter. A single request cannot handle this: it would exceed timeout limits, exhaust connection pools, run out of memory on either side, and produce a response too large to parse. The customer needs a way to process unbounded result sets through repeatable batches, and the API needs to provide that pattern without becoming a custom integration project for each customer.
The cursor-based bulk pattern
The shape that works is cursor-based: the customer initiates a bulk operation with a filter (or an explicit set of IDs), the API responds with a job ID, the customer polls for progress, and the work happens server-side at a controlled rate. The cursor concept from list-endpoint pagination applies here: server-side state tracks position through the work, the customer asks "what is the status of job X" rather than "process the next batch of items."
The minimum viable endpoint shape is:
- POST /bulk/operations with operation type, filter or ID list, and configuration parameters. Returns
{job_id, status: 'pending', estimated_items, estimated_duration_seconds}with HTTP 202 Accepted. - GET /bulk/operations/:job_id returns current status with
{status, processed, succeeded, failed, skipped, total, started_at, eta_seconds}. - GET /bulk/operations/:job_id/results returns paginated per-item results once the job is complete. The results endpoint is itself cursor-paginated because the result set can be millions of items.
- POST /bulk/operations/:job_id/cancel cancels an in-progress job with best-effort semantics—items already processed remain processed, items not yet started will not be started.
The four-endpoint shape gives customers what they need: a way to start large jobs without waiting for completion, a way to monitor progress, a way to retrieve detailed results, and a way to stop a job that is taking longer than expected or processing the wrong items.
The async vs sync cutover
The same API should support synchronous bulk operations for small datasets and asynchronous bulk operations for large ones. The cutover is conventionally 100-1000 items: below the threshold, return the results inline with HTTP 200; above the threshold, return a job ID with HTTP 202 and require polling.
Some APIs make this implicit (the endpoint decides based on the request size), and some make it explicit (the customer sets an async: true parameter). Explicit is generally better: it removes the surprise where a customer's normal small-batch flow suddenly returns a job ID when their data grows, and it lets customers force async behavior when they know they want fire-and-forget semantics even for small batches.
The synchronous endpoint should not just be a wrapper around the async endpoint that polls until completion. The wrapper approach has the failure mode where a slightly-too-large batch times out at the proxy layer even though the underlying job completed successfully. The synchronous endpoint should have its own implementation with a hard size cap that returns 422 rather than 202 if the request is too large.
Filter vs ID list inputs
The bulk operation needs a way to specify what items to operate on. The two common patterns are filter-based (operate on all items matching a query) and ID-list-based (operate on these specific items).
Filter-based is more powerful but introduces a consistency question: what happens if items match the filter when the job starts but no longer match by the time the job processes them? The two reasonable answers are snapshot semantics (capture the item list at job start, ignore subsequent changes) and live semantics (re-evaluate the filter as the job runs). Snapshot semantics are more predictable; live semantics are more useful for some workflows. The right default is snapshot, with an opt-in flag for live.
ID-list-based is simpler but limited by the request size: 10,000 IDs is a reasonable maximum for a single POST body, and customers with more items need to make multiple requests. The pattern that works for very large ID lists is upload-then-reference: the customer uploads a file containing IDs to a separate endpoint, gets back a file ID, then references the file ID in the bulk operation request.
Rate limiting and quotas
Bulk operations interact awkwardly with normal rate limits. A 5-million-item bulk operation does not make 5 million API requests, so request-based rate limits do not apply, but it does consume 5 million quota items, so item-based quotas do apply.
The pattern that works is to count quota usage at the item level inside the job, deducting from the customer's quota as items are processed. The job will pause if it exhausts the customer's quota and resume when the quota refills (typically at month boundaries). The customer can monitor quota usage via the job status endpoint, which should include quota information.
The bulk operation itself should be rate limited at the job-creation level, not per item. A customer should not be able to start 1000 bulk jobs simultaneously; the limit is typically 1-10 concurrent jobs per account, with a queue for additional submissions. The concurrency limit prevents customers from saturating shared infrastructure and provides natural backpressure when the job processing system is under load.
The progress reporting requirement
Long-running jobs need progress reporting that customers can use for both display and automation. The basic information is processed-vs-total counts, but customers also need success/failure breakdowns, recent error patterns, and estimated time to completion.
The minimum schema for progress reporting is the counts (processed, succeeded, failed, skipped, total), the timing fields (started_at, last_updated_at, eta_seconds), the status enum (pending, running, paused, cancelled, completed, failed), and an error summary that includes the most recent failed items with their failure reasons. The error summary is the highest-leverage debugging tool for customers whose jobs are mostly succeeding but with some failures—they need to know what is failing without retrieving the full results endpoint.
The ETA calculation is harder than it looks. Naive ETAs based on linear extrapolation from current progress can be wildly wrong when later items are systematically slower than earlier items (a common pattern when items are processed in some order that affects work). The pattern that works reasonably well is to compute ETA from the recent rate (last few minutes of progress) rather than the overall rate, smoothed with exponential moving average to reduce noise.
Idempotency at the job level
Bulk operations need idempotency the same way single-item operations do—a customer should be able to retry a job submission without creating duplicate work. The pattern that works is an idempotency key on the job submission: the customer provides a key, the server stores the key with the job, and a duplicate submission with the same key returns the existing job ID rather than creating a new job.
The idempotency key should be scoped to the customer account and have a TTL that exceeds the maximum expected job duration. For jobs that typically complete in hours, a 24-hour idempotency key TTL is conservative. For jobs that can run for days, a longer TTL is needed.
Per-item idempotency within a job is a separate question. The recommended pattern is that each item in a bulk operation has an effective item-level idempotency key derived from the job ID and the item identifier. The internal processing path uses these item-level keys to ensure that a job that is paused, resumed, or partially retried does not double-process items.
Three patterns that fail
First, the synchronous-bulk-with-no-async-fallback pattern. APIs that only support synchronous bulk operations limit their customers to whatever fits in a single request timeout. Customers with larger datasets either chunk the work themselves (badly, usually) or move to a different service. The async pattern is more work to build but pays off the first time a customer has more data than fits in 30 seconds of processing.
Second, the no-progress-reporting pattern. Jobs that just return a status of "in_progress" with no details leave customers unable to estimate when their work will complete or whether anything is going wrong. The result is customer support tickets asking "is my job stuck?" that the support team cannot answer without checking server-side logs. The cost of building richer progress reporting is small compared to the support load it eliminates.
Third, the no-cancel-endpoint pattern. Long-running jobs that cannot be cancelled cause customer pain in proportion to job length. The customer started a job with the wrong filter, or with the wrong destination, or with the wrong configuration, and has to wait for the job to complete (potentially hours) before they can do the corrected version. The cancel endpoint is one of the most underused but high-value features in the bulk-operations API surface.
Where we stand
The four products are early-stage and have not faced customer bulk operations at the millions-of-items scale. DocuMint has batch invoice generation that operates on up to 1000 invoices in a single request; CronPing has bulk monitor operations on up to 500 monitors; FlagBit has bulk flag updates up to 100 flags per request; WebhookVault has bulk endpoint configuration changes up to 50 endpoints per request.
The cursor-based bulk pattern documented here is the planned migration path when any of the four products hits customer demand for larger operations. The investment is the four-endpoint shape (POST/GET/GET/POST), the underlying job queue infrastructure (Postgres jobs table with SKIP LOCKED claim, worker pool with concurrency control, progress tracking), and the dashboard surface for customers to monitor jobs without programmatic access.
The deeper observation about bulk API operations is that they are a different product surface than single-item operations, not a quantitative extension. The bulk-operations surface has its own conceptual model (jobs as first-class entities), its own failure modes (partial completion, mid-job cancellation, resumability), its own monitoring requirements (progress reporting, error summaries, ETA calculation), and its own customer support patterns. Treating bulk operations as just "the small endpoint but with more items" produces APIs that work for the documented examples and fall apart for actual customer use. The investment in proper bulk-operation infrastructure pays back when customers grow into the scale where they need it, and not before.
Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.