Designing API Project Export Endpoints: Data Portability That Customers Actually Use

Export endpoints are a contract about what data the customer can take with them. The customer cases that drive the requirement are competitive migration, internal duplication, account closure, and audit. The right shape for each is different.

Most B2B SaaS products eventually grow an export endpoint. The customer-facing reason is data portability: the customer should be able to take their data somewhere else if they choose. The internal reason is that exports prevent a class of complaint that is hard to defend against, which is that the platform is a roach motel where data goes in and never comes out. The shape of the export endpoint is more consequential than it looks. Different customer cases want different shapes, and trying to satisfy all of them with a single download button produces an export that none of them actually use.

The four customer cases

The customers who actually use export endpoints are doing one of four things, and each of them wants something different.

The first case is competitive migration. The customer has decided to leave the platform and wants their data in a format that the competitor can import. The competitor's import format matters more than the original platform's preferred export format. JSON is usually the right baseline because most platforms can parse it. CSV is useful for tabular data. The customer running this case wants completeness and a stable schema. They are willing to do work on their side to map fields.

The second case is internal duplication. The customer is setting up a second account, a sandbox copy, or a backup. They want the export to round-trip back into the same platform. The format should match the platform's own import format. The schema can be platform-specific because both sides are the same platform. The customer wants the round-trip to preserve everything, including IDs and timestamps and metadata that the competitive-migration case does not care about.

The third case is account closure. The customer is closing the account and wants a record of what was in it. The format does not need to round-trip. The completeness requirement is high. The customer may not look at the export for years, so the format should be self-documenting enough to be useful without context. PDF reports for human-readable summaries plus JSON or CSV for the underlying data is a common shape.

The fourth case is audit and compliance. The customer needs to demonstrate to a regulator, auditor, or internal stakeholder what data the platform holds about them. The format should be reproducible (the same export today and a month from now should produce the same content for the same data), timestamped, and signed if the audit requires it. JSON with explicit metadata blocks works well.

The point of the four-case breakdown is that no single export format is right for all of them. A single JSON download is closest to the competitive-migration case. An account-closure case wants more documentation. An audit case wants more metadata. An internal-duplication case wants exact round-trip semantics. A platform that ships one export and treats all four cases as the same is shipping the wrong export for at least three of them.

The minimum viable surface

The minimum viable export endpoint has three parts. A POST /exports endpoint that takes the customer's request and returns an export_id with HTTP 202. A GET /exports/{export_id} endpoint that returns status (pending, running, completed, failed, expired) and a download URL when ready. A GET /exports/{export_id}/download endpoint that streams the actual file.

The asynchronous shape is important even when the data is small enough to generate synchronously. Customers with large accounts have exports that take minutes. Customers with small accounts have exports that complete in seconds. A single endpoint that sometimes returns 200 with the file and sometimes returns 202 with a job ID is confusing to integrate against. Always async, even for small payloads, is the simpler customer-facing contract.

The download URL should be a presigned URL to object storage where possible, rather than streaming through the application server. The export file is potentially large. The application server has better things to do than serve hundreds of megabytes of CSV over a slow client connection. Object storage with a time-limited signed URL is the right pattern. The TTL on the URL should be long enough that the customer can click the email link a day later (24 hours is a common default) but not so long that the URL becomes a permanent backdoor.

The format decision

JSON is the right default for the competitive-migration and audit cases. It is parseable by every language, supports nested structure naturally, and produces files that are human-readable enough for diagnostic purposes. JSON Lines (one JSON object per line) is the right variant when the export is large enough that loading the whole file into memory is a concern, since it supports streaming parse.

CSV is the right complement for tabular data that customers want to load into spreadsheets or business intelligence tools. CSV is bad at nested structure, bad at empty fields, bad at quoting, and bad at characters that conflict with the chosen delimiter. The platform's CSV exporter has to make explicit choices about all of those things and document them. A common pattern is JSON as the canonical export and CSV as an alternative shape for the parts of the data that are naturally tabular.

The platform-specific format for the round-trip case is usually a superset of the JSON format with additional metadata. The customer using this case is already on the platform; they will accept a format that other platforms cannot read. The reciprocal import endpoint should accept the same format.

The completeness contract

The most important property of an export endpoint is not the format or the speed but the completeness contract. The customer who uses the export should be able to say, with confidence, that the export contains everything the platform has about them.

Completeness is harder than it looks. The platform has the customer's primary resource data, which is obvious. It also has audit logs, billing history, configuration, webhook delivery records, comments, attachments, derived state, and so on. Each of those categories is a decision about whether to include it in the export. The exclusions should be explicit and documented. An export that silently omits webhook delivery records is going to surprise the customer who wanted webhook delivery records.

The documentation should be a separate page from the API reference. The export documentation should list every category of data the platform stores about the customer and say, for each category, whether it is included in the export and in what format. The list should be updated when new categories are added. A platform that adds a feature without updating the export documentation has silently shipped an incomplete export.

Three patterns that fail

The first pattern that fails is the synchronous export. A GET /export endpoint that streams the file directly is fine for small accounts and times out for large ones. The customer has no way to know in advance which side of the boundary they are on. The fix is to make the endpoint always asynchronous regardless of size.

The second pattern that fails is the proprietary format. An export in a format that only the platform itself can parse is a competitive-migration export that does not work for competitive migration. The format should be at least one of JSON, CSV, XML, or PDF, with JSON as the right default. A custom format is fine as a supplement but not as the primary export.

The third pattern that fails is the silently-incomplete export. The customer downloads what they assume is a complete export, uses it for migration, and discovers six months later that a category of data was missing. The fix is the explicit documentation, plus a manifest in the export itself that lists what was included.

What export endpoints do not have to do

The export does not have to produce a file that some specific competitor can import. Different competitors have different import formats; the platform cannot maintain mappings to all of them. The export provides standard JSON, and the customer (or the competitor's onboarding team) maps it.

The export does not have to be free. Most platforms include exports in their standard plan but rate-limit them. A customer who runs daily full-data exports is using a more expensive workload than a customer who exports once at account closure. Rate limits on the export endpoint with documented thresholds (one export per day, ten exports per month, etc.) are reasonable.

The export does not have to include data the platform does not have. A platform that stores customer credit card details only as Stripe customer IDs cannot include the card numbers in the export because the platform does not have them. The completeness contract is about what the platform has, not about what the customer might want.

The deeper observation about export endpoints is that they are one of the surfaces where customer trust is built or lost. A platform with a confident, complete, well-documented export endpoint signals that the customer is in control of their data. A platform with a half-built or undocumented export signals the opposite. The investment pays back in customer-facing trust more than in retention metrics.


Read more essays and technical writing at anethoth.com — a notebook on databases, distributed systems, biology, and the engineering that holds the world together.