Designing API Bulk Import Endpoints: Patterns for Customer Migration

Bulk import is the endpoint that wins or loses migration deals. A customer evaluating your API has a CSV from their old vendor and wants to know how painful it is to get into your system. The shape of the import endpoint determines the answer.

Bulk import is one of those endpoints that nobody designs early because nobody has any data yet and everybody designs late because by then customers are using whatever ad hoc tooling exists and a real bulk import endpoint requires undoing those workarounds. The result is that bulk import endpoints in most APIs feel like they were added under deadline pressure: poorly documented, partially specified error semantics, opaque progress reporting, and arbitrary size limits set by whatever the original implementer's load test happened to handle.

The cost of getting bulk import wrong is not just developer ergonomics. Bulk import is the endpoint that customers hit during evaluation, when they are deciding whether to migrate from an existing vendor. The shape and quality of the import experience determines the answer to "how painful is this going to be," which is often the deciding question in a SaaS evaluation. We have iterated through several bulk import designs across DocuMint, CronPing, FlagBit, and WebhookVault, and the patterns that hold up are about taking the migration use case seriously rather than treating bulk as a hastily-bolted-on cousin of single-item create.

The synchronous-vs-async cutover

The first design decision is whether the import endpoint is synchronous (the response body returns the result of every imported item) or asynchronous (the response returns a job ID that the client polls for completion). The two have different sweet spots and the wrong choice produces predictable pain.

Synchronous import is right for small batches, roughly 100 to 200 items. The latency is bounded enough that the client can wait, the response shape is straightforward, and most HTTP intermediaries (load balancers, proxies, browser fetch APIs) handle the request without timing out. The cost is that the synchronous endpoint cannot honestly handle larger imports without either timing out or holding a request thread for minutes.

Async import is the right answer once batches reach the thousands. The client posts the import payload, gets a 202 response with a job ID, polls a status endpoint or receives a webhook on completion, and downloads the results from a separate endpoint. The cost is more moving pieces; the benefit is that the operation can take as long as it actually takes and the client can survive transient network issues.

The pattern that fails is forcing a single endpoint to do both. Once the threshold is set, document it clearly: "imports of up to 200 items are processed synchronously; imports larger than 200 must use the bulk endpoint." Customers handle a clear constraint better than ambiguity about which behavior they will get.

Per-item idempotency, not per-batch

The temptation is to treat the entire import as one operation with one idempotency key. The customer retries the import, you check whether you have seen the key before, and you either run it or skip it.

This is wrong for the realistic failure modes. Imports fail partway through: network drops, individual rows fail validation, the worker crashes after processing 700 of 1000 items. The customer wants to retry just the failed items, not the whole batch. If the idempotency is per-batch, the retry creates a new batch with new idempotency, the customer has to re-upload the entire CSV, and the items that succeeded the first time are processed again or rejected as duplicates.

Per-item idempotency is the right shape. Each row in the import payload includes an idempotency key (often a customer-supplied external ID like an invoice number, a CRON job name, or a flag key). If a row with that key already exists for this customer, the import treats it as a no-op and includes it in the response with status="exists". If it does not exist, it is created. The customer can resubmit the entire CSV after a partial failure and the system handles it correctly.

The implementation has cost: per-item idempotency requires either a unique constraint on the (customer_id, external_id) pair or a per-customer dedup table. The cost pays off the first time a customer's network drops in the middle of a 50,000-row import.

The all-or-nothing question

The import endpoint has to take a position on what happens when some items succeed and some fail. There are three honest answers, each with a clear use case:

Best-effort imports process every valid item and report individual failures. The response includes a per-item status (created, updated, skipped, failed-with-reason). This is the right default because it matches what customers actually want from a migration: get as much data in as possible, then look at what failed and fix it.

All-or-nothing imports either commit every item or commit none of them. This is right when the items have referential dependencies on each other and a partial import would leave the data in an inconsistent state. The implementation requires holding a transaction across all items or using a staging area.

Stop-on-first-error imports process items in order until the first failure, then stop. This is right when the customer needs to handle errors in a specific order and is rare enough in our domain that we have not implemented it.

The choice should be a parameter on the request, with a documented default. Forcing one behavior or the other surprises customers; making it implicit in the endpoint name is even worse because the same operation can need different semantics in different contexts.

Quotas and rate limits

The bulk endpoint exists in a tension with rate limits. If the API has a per-second rate limit of 100 requests, a bulk import of 10,000 items either has to count as one request (which lets a customer bypass the rate limit by always batching) or has to count as 10,000 requests (which makes bulk import unusable).

The pattern that works is to count items, not requests, against quota. The bulk endpoint debits 10,000 items from the customer's monthly quota; the per-request rate limit governs how often the customer can submit batches, not how many items each batch contains. The quota model has to be designed around the actual unit of value (items, in our case) rather than the technical unit of HTTP requests.

Internal rate limiting is a separate concern. The bulk endpoint should not be allowed to consume so much database capacity that other customers are affected. The implementation pattern is a per-customer concurrency limit on bulk import workers and a global queue depth that triggers backpressure.

Validation: pre-flight vs in-flight

A common request from customers evaluating a bulk import endpoint is "can I check whether my CSV is valid before I commit to importing it." The pattern that handles this is a separate POST /bulk-import/validate endpoint that takes the same payload, runs all the validation logic, and returns a per-item validation report without writing anything. The customer can iterate on their CSV until validation passes, then run the actual import with confidence that nothing will fail.

The implementation requires factoring the validation logic out of the create path so it can be called twice. The cost is small relative to the customer experience improvement; the failure mode without it is that customers learn the validation rules through a series of failed imports.

Status reporting and progress

Async imports need a status endpoint that reports more than "in progress" and "done." The customer who started an import of 50,000 items wants to know how far along it is, how many items have succeeded, how many have failed, and how long the rest is likely to take.

The shape we have settled on is a status response with: total items, processed items, succeeded count, failed count, started_at, estimated_completion (if available), and a results URL that can be downloaded once the job is complete. The results URL points to a CSV or JSON file with one row per imported item and the per-item status; this is more useful than embedding the results in the status response because it can be downloaded once and used for offline error analysis.

The CSV question

Customers want to import CSVs because their existing data lives in CSVs. The honest answer is that the API should accept a structured payload (JSON, ideally) and the client tooling should handle CSV-to-JSON conversion. CSV has too many edge cases (quoting rules, escape characters, encoding ambiguities, type inference) to handle reliably at the API layer.

The pattern we use is to provide a CSV-to-JSON conversion utility in our CLI and SDKs, document the JSON schema clearly, and accept JSON in the API. Customers who insist on CSV can use the conversion utility or write their own; we have not had complaints about this, because the customers who care about this care about it enough to want the conversion to be deterministic.

Five tests that catch the bugs that bite

The tests that exercise the realistic failure modes:

  1. Submit a 1000-item batch and verify per-item statuses match the actual outcomes (no silent successes or failures).
  2. Submit the same batch twice and verify that the second submission produces all "exists" statuses and creates no duplicates.
  3. Submit a batch with one row that violates a unique constraint at the database level (not just application validation) and verify the rest of the batch processes correctly.
  4. Submit a batch larger than the documented limit and verify the rejection happens before any items are processed (not after partial work).
  5. Submit a batch, kill the worker partway through, restart the worker, and verify the import resumes correctly without double-processing.

These tests catch most of the bugs that show up under real customer load. The tests that do not catch enough are the ones that exercise only the happy path; bulk import is the endpoint where the unhappy paths matter most.

The deeper observation

Bulk import is one of those features whose quality is invisible until a customer is in the middle of an import that is going wrong. When it works smoothly, nobody notices; when it fails partway through with cryptic errors and no resumability, the customer's first impression of the product is a frustrating evaluation experience.

The investment in a good bulk import endpoint pays back specifically at evaluation time, when prospects are deciding whether to migrate. A customer who imports 50,000 items in a single afternoon and sees clear status reporting and per-item error handling forms a different opinion of the product than a customer who fights with a half-broken bulk endpoint for two days. The shape of the endpoint is a product decision, not a technical detail, and the patterns that hold up are about taking the customer's migration experience as the design constraint.

Read more