Designing API Resource Expansion: When to Embed and When to Reference
API responses with nested resources can either embed the full child or return a reference the client follows separately. The decision shapes latency, payload size, and customer integration complexity. The right answer is usually neither default.
When an API returns a resource that has related resources, the design question is what to do with the related ones. A customer record has a billing address. An invoice has line items. A monitor has recent ping results. The response can embed the full related resource inline, return only a reference (an ID or URL the client follows), or do something in between. The choice has consequences for latency, payload size, customer integration complexity, and the API's ability to evolve.
The textbook answers do not converge. JSON:API recommends explicit references with sparse fieldsets and an "include" parameter for embedding. HAL recommends embedded resources with hypermedia links. GraphQL recommends letting the client request the exact shape. Stripe uses an explicit "expand" parameter with default-collapsed references. None of these is the right answer in all cases, and the choice depends on how customers actually use the API.
What customers actually do
Customer integration patterns cluster into three shapes. The dashboard backend pattern fetches a list and renders each item with several related resources visible. Without expansion, this is N+1 fetches: one for the list, one per item for each related resource. With expansion, it is a single request. The latency improvement is large, often 10-50x on the user-perceived response time.
The webhook handler pattern processes one event at a time and may or may not need related resources depending on event type. For most events, no expansion is needed; the event has the data the handler cares about. For some events (typically those involving multiple objects), expansion is genuinely useful.
The data export pattern paginates through many records and writes them to a destination. The export wants every field of every record but does not need related resources expanded (the related resources are themselves separate records that will be fetched in their own pagination). Expansion in this pattern is wasted bandwidth.
The same API serves all three patterns, and the right expansion default is different for each. This is why the explicit opt-in pattern (return references by default, let clients ask for expansion) tends to win: the default is correct for the data-export case and the webhook case, and the dashboard case can opt in.
The expand parameter pattern
The Stripe-style expand parameter is the most widely-adopted answer to this question. Default response includes references (typically string IDs); clients pass expand=field1,field2.nested_field to request expansion. The expanded fields are replaced inline with the full resource.
The implementation has three constraints. First, the expand allowlist is per-resource and per-field: not every field can be expanded, and the documented list defines the contract. Second, expansion depth is bounded: typically one or two levels, not arbitrary depth. Third, expansion does not nest indefinitely: if you expand a field and the expanded resource has its own expandable fields, they remain unexpanded unless you explicitly request expansion of the nested path.
The schema implementation: store the field as a foreign key in the database, return the foreign key value in the response by default, and replace the foreign key with the resolved object when expansion is requested. The database query becomes a JOIN when expansion is requested and a simple SELECT when not. The performance benefit of not joining unless needed is substantial for high-traffic endpoints.
The include parameter alternative
JSON:API recommends a different shape: keep the main resource clean (always references), and include expanded resources in a separate "included" array. Clients deduplicate by ID. This solves a real problem (when a list returns 100 items all referencing the same parent, you do not want the parent embedded 100 times) but introduces complexity for the common case (single-resource responses with one or two expansions, where the deduplication is unnecessary).
For B2B SaaS APIs the included array is usually wrong. The deduplication wins are real but small for typical query shapes, and the additional client-side parsing complexity is a real cost. The Stripe-style inline expansion produces simpler customer code and is the right default unless the deduplication wins are large for your specific access patterns.
What expansion should not do
The expand parameter is a read-only convenience. It should not affect what mutations the API accepts: a PATCH or POST request body should always use IDs for references, not embedded objects, because embedding objects in mutation bodies creates ambiguity about whether the mutation should update the related resource or just reference it.
Expansion should not change the response status code. A request with expand=invalid_field should not 400; it should either ignore the invalid field (the lenient option) or 400 with a clear message about which field was invalid (the strict option). We prefer the strict option because the lenient option produces silent bugs in customer code.
Expansion should not be the only way to access related data. Customers should always be able to fetch the related resource directly via its own endpoint. The expand parameter is an optimization for response composition, not the primary access path. If your API only exposes child resources via parent-expansion, you have created a coupling that will hurt customers later.
The N+1 problem inside the API
The implementation gotcha that bites every team: expansion done naively produces N+1 database queries on the server side. The list endpoint returns 100 customers, the expand=billing_address parameter triggers 100 separate queries for billing addresses, and the response that was supposed to be faster than the un-expanded version is actually slower.
The fix is dataloader-style batching: collect the IDs that need resolution, issue a single query that fetches all of them, then construct the response. This is more code than the naive implementation and is one of the places where ORMs differ in their support for the pattern. Some make it automatic (SQLAlchemy with selectinload); some require explicit batching code; some make it impossible without dropping to raw SQL.
The diagnostic for an N+1 bug is database query count divergence between an N=1 request and an N=100 request. The expanded N=1 request should produce two queries (the list query and the batch-resolve query). The expanded N=100 request should produce the same two queries, not 101. If the count scales with N, you have a bug.
The wildcard question
Some APIs support expand=* to expand all expandable fields. We do not recommend supporting this. The customers who would benefit are exactly the ones who should be using the un-expanded version with explicit follow-up requests: they want every field, including ones that future API versions might add as expandable. A wildcard expansion produces unpredictable response sizes that are hard to budget for, and the customer code becomes harder to maintain because it depends on the API's current expansion list rather than an explicit field list.
The opposite extreme (no expansion at all, always reference-only) is also wrong for APIs that serve dashboard backends. The dashboard case is real and the latency cost of not supporting expansion is large. The middle ground (explicit expand parameter with a documented allowlist) is the right answer for almost all B2B SaaS APIs.
The polymorphic complication
If your API has polymorphic references (an event_target field that can point to a customer, an invoice, or a subscription depending on event_type), expansion gets complicated. The expand parameter has to handle the case where the expanded type varies per row. The implementation typically uses a discriminator field that tells the client what type each expanded object has.
Stripe's polymorphic objects (events, sources) all use this pattern. The expanded object has a "type" field that identifies what kind of resource it is, and clients must check the type before accessing type-specific fields. This is the right pattern but requires customer code to be more careful than the non-polymorphic case.
Our use across the four products
DocuMint, CronPing, FlagBit, and WebhookVault currently do not implement an expand parameter. Most response shapes are simple enough that expansion does not pay off: invoices are mostly self-contained, monitors have a small fixed set of fields, flags are independent. The case where expansion would help most is WebhookVault's delivery records, where each delivery references an event and a subscription, and customers building dashboards on top would benefit from expanding both.
Adding expand to WebhookVault is on the roadmap. The implementation will follow the Stripe pattern: default to reference IDs, support expand=event,subscription on the deliveries list endpoint, validate the expand list against an allowlist, and use a batched query to resolve. The other three products will add expansion only if customer integration patterns make the case for it.
The deeper observation: the right API design depends on which customer integration patterns are common, not on which design is most elegant in the abstract. Most B2B SaaS APIs serve a mix of dashboard backends, webhook handlers, and data exports, and the right defaults are the ones that work for all three. The expand parameter is the convergent answer because it lets each integration pattern make its own choice without imposing cost on the others.
Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) keep the lights on.