Designing API Field Selection: Sparse Fieldsets and the Patterns That Stay Maintainable

When a list endpoint returns 30 fields per object and the customer wants 3, response sizes compound across pages and machines. Field selection is the underused API pattern that makes large responses survivable.

A list endpoint that returns 30 fields per object is fine when the customer wants all 30 fields. When the customer wants three of them, the unused 27 are pure waste, paid by the customer in bandwidth and parse time and by the provider in egress and database I/O. Field selection (also called sparse fieldsets, after the JSON:API spec) is the API pattern that lets customers ask for the fields they want and not pay for the ones they do not. It is underused outside large APIs where the bandwidth math forced the issue, but the pattern is straightforward and the gains are large in the cases it fits.

What field selection does

The basic shape is a query parameter that takes a comma-separated list of field names: GET /v1/invoices?fields=id,total,status. The response includes only the named fields plus typically a small set of always-included fields (object, id, created). The customer gets a smaller response, the provider does less work, and the API contract stays the same shape.

The pattern scales naturally to nested objects via dot-notation or bracket-notation: fields=id,customer.email,line_items.amount. It composes with pagination (each page item is filtered), with includes/expands (an expanded object can have its own field selection), and with the existing endpoint surface (no new endpoints, just an additional query parameter).

The cases where it matters most

Field selection helps most when the response objects are wide (many fields), the customer pattern is narrow (asking for a few fields repeatedly), and the volume is high (lots of requests or lots of items per request). The combination is common in dashboard backends pulling summary data for hundreds of rows, mobile applications fetching list views over slow networks, and bulk export workflows pulling all records of a type.

The cases where it helps least are single-object fetches (the overhead of parsing the field list is comparable to the savings), small responses (where the protocol overhead dominates), and customers who use a small constant set of fields (where the right answer is often a separate endpoint).

The Stripe API uses field selection via the expand parameter for the related-object case and minimum-response selection via the explicit fields list. The GitHub API uses a similar pattern. JSON:API formalizes the convention with the fields[type] bracket syntax. Linear and Vercel and several modern APIs offer variants. The patterns have converged enough that customers expect the shape.

The basic implementation

Parse the fields parameter into a set of field paths. Validate against a per-resource allowlist (fields the API exposes, not arbitrary names that could expose internal columns). Filter the database query to fetch only the named fields (if the storage layer supports it) and the response serializer to include only the named fields.

The minimum-effort implementation does only the response-side filtering, fetching all fields from the database and dropping unused ones in the serializer. This gives the customer bandwidth and parse-time savings but does not save provider work. The full implementation pushes the field list into the database query (SELECT only the requested columns) and saves I/O as well.

For most APIs the database-push optimization is worth doing for the wide-table cases (where the table has expensive-to-fetch columns like JSONB blobs or computed fields), and not worth doing for the narrow-table cases (where most of the cost is the row read, not the column read). The discipline is to measure which queries benefit before optimizing across the board.

The always-include and never-include lists

Some fields should always be included regardless of the field list: typically id, the object type identifier, and a request_id or pagination cursor. Some fields should never be includable: internal IDs, soft-delete timestamps that customers should not depend on, fields that are part of an internal audit trail. The allowlist enforces both.

The mistake is to treat field selection as a transparent passthrough. It is a permission boundary: the field list is what the customer can ask about, not what the database has. Internal-only fields should not appear in the allowlist even if they appear in the underlying table. The discipline of maintaining the per-resource allowlist becomes part of the schema-design discipline of distinguishing internal from external state.

The expansion question

Field selection composes with the related-object expansion pattern but in a non-obvious way. If the API supports ?expand=customer to fully include the related customer object, the natural extension is ?expand=customer&fields=id,customer.email to include only the customer's email. The implementation needs to recognize that the customer.email field path implies expanding the customer relationship.

The cleanest design is to make field selection imply expansion when the path requires it, so customers do not need to specify expansion separately for fields. The alternative (requiring explicit expansion plus field selection) is more verbose and traps customers who think field selection works on related objects but forget the expand parameter.

The harder question is what to do with related-object field selection when expansion is not requested. The reasonable default is to silently include the field paths that work on un-expanded objects (typically the related object's id) and ignore the rest, with documentation explaining the behavior. Returning an error in this case is technically more correct but produces customer frustration for a small benefit.

The wildcard problem

Customers will ask for wildcard syntax: ?fields=customer.* to include all customer fields, or ?fields=* to include all top-level fields. The wildcard is hostile to the API's ability to add fields without breaking customers. A customer who asks for customer.* will see new fields appear when they are added, which may break their parsing code or expose them to new data they did not expect to handle.

The right default is to not support wildcards at all. The exception is the no-field-list case (where omitting the parameter returns all fields by current convention), which is functionally a wildcard but anchored to the documented endpoint contract rather than a customer-supplied pattern. Customers who want all fields can omit the parameter; customers who want specific fields should name them explicitly.

Three patterns that hurt

First, accepting field names that do not exist without error. The customer asks for ?fields=id,totl (typo for total), the response includes id but no total, and the customer wonders why. The right behavior is to return a 400 with a field-not-recognized message that names the invalid field. The cost of strictness is occasional customer friction; the benefit is debuggability.

Second, applying field selection to error responses. Error bodies should have a stable shape regardless of the field selection parameter, so customer error-handling code does not break under field selection. The implementation should bypass the field-selection layer for non-success responses.

Third, allowing field selection on bulk-write endpoints. POST/PATCH/PUT endpoints take a body that defines what should be written; mixing in field selection for the response can produce confusing semantics where the write succeeded with all fields but the response shows only some. The clean pattern is to apply field selection only to GET responses and to apply it consistently to all GET endpoints.

Our use across the four products

For our four products we have not implemented field selection because the response objects are narrow enough that the gains are modest. DocuMint invoice objects have about 15 fields and customers typically use all of them. CronPing monitor objects have about 12 fields. FlagBit flag objects have around 20 fields including the rules array. WebhookVault request objects can be larger (headers and body can be substantial), and this is the one product where field selection would pay off, particularly for the list endpoints that customers use to scan recent activity.

If we add the pattern, it will go on WebhookVault first, focused on the headers-and-body large-field cases, with a documented per-resource allowlist and explicit field-not-recognized errors. The deeper observation is that field selection is one of the patterns that pays back proportional to response width and request volume, and most small-to-medium APIs do not have enough of either to justify the implementation cost. The exception cases are visible enough that knowing the pattern lets you recognize when to invest.


This essay is part of our ongoing series on API design. Our products DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) all run on FastAPI with structured JSON responses, currently without field selection but with the pattern available if response widths justify it.

Read more