Designing API Error Code Catalogs: The Documentation That Earns Its Maintenance

Most APIs ship with a handful of documented error codes and a long tail of undocumented ones returned in practice. The teams whose customers integrate fastest are the teams whose error code catalog matches what the API actually returns.

An API error code is a contract. The customer's integration code branches on it. Customer support escalations reference it. Internal monitoring alerts on it. When the code is undocumented or its meaning drifts, every consumer of the contract pays the cost: customer code that retries on errors that should not be retried, support tickets that read "I got error X, what does it mean," monitoring dashboards that group unrelated failures under a single error. The error code catalog is one of the highest-leverage documentation surfaces an API team owns, and it is also one of the most consistently neglected.

What an error code is for

The error code is the machine-readable answer to a question the customer's code needs to make a decision about. Three decisions account for most of them: should I retry, should I escalate to a human, should I change the request before trying again. The status code answers the first decision coarsely (5xx generally retryable, 4xx generally not). The error code answers the second and third decisions precisely.

An error code that does not affect a decision the customer can make is documentation theater. We have seen catalogs with 200+ codes, of which 180 mapped to the same customer action (file a ticket) and could have been a single code with a request_id in the body. The right number of error codes is the number of distinct actions a customer might take, plus a small overhead for diagnostics.

The minimum viable error response body

Five fields cover most cases:

{
  "code": "invoice.line_items.required",
  "message": "At least one line item is required.",
  "field": "line_items",
  "doc_url": "https://docs.example.com/errors/invoice.line_items.required",
  "request_id": "req_01HABCD..."
}

The code is the stable contract. The message is the human-readable explanation. The field is the path to the offending input when applicable. The doc_url deep-links to the catalog entry for the code. The request_id is the support channel.

The two patterns we have seen fail most often are: omitting the stable code and forcing customers to string-match on the message (which then drifts when the message is improved), and omitting the doc_url so customers have to navigate the documentation site to find the catalog entry. Both increase friction in proportion to how often customers hit the error.

Naming the codes

The convention that scales is resource.subresource_or_field.problem. For instance: invoice.line_items.required, invoice.line_items.invalid_currency, webhook.signature.timestamp_too_old, account.payment_method.declined. The pattern matches the typical hierarchy of customer-side code (one error handler per resource type, branching on the sub-resource and problem inside).

Three patterns to avoid: generic codes like validation_failed that force customers to parse the message anyway, version-suffixed codes like invoice_v2_error that promise stability they cannot keep, and HTTP-status-mimicking codes like bad_request that duplicate information already in the status code.

The catalog page

Each error code deserves a dedicated documentation page with: the code, the HTTP status it accompanies, when it occurs (preconditions), what the customer should do, a code example showing the fix, and any related error codes. The page is small (often 200 words) but its presence converts a support ticket into a self-service resolution.

The discipline that keeps the catalog honest is documentation-as-test: a CI check that scans the API source for error-code emission and fails if any emitted code is missing from the catalog. We have seen this catch dozens of undocumented codes that had crept in over months. The maintenance cost is small once the discipline is in place.

Code stability is a feature

An error code, once published, is part of the API contract and should be deprecated like any other API surface. The two-year deprecation window standard applies: announce, mark as deprecated in documentation and response headers, replace with a new code, and keep emitting the old code for the deprecation window.

The implementation pattern is to keep an internal mapping that lets a single emission point return either the old or new code based on a configuration flag or version negotiation header. The discipline of treating error codes as versioned API surface comes naturally to teams who have been burned by silently renaming a code and breaking customer integrations.

Three patterns that hurt

Generic catch-all codes. A single internal_error that covers anything from disk-full to NullPointerException prevents customers from distinguishing transient from persistent failures. The retryable cases get conflated with the bug cases.

Messages that contain the action. "You need to add a billing address" as the message and no separate field or action forces customers to parse English to extract the actionable information. The message is for humans; the structure is for code.

Status code overloading. Using 400 Bad Request for both schema violations and business-rule violations is technically defensible but operationally painful. 422 Unprocessable Entity for business-rule violations and 400 for schema violations is the small additional cost that buys a meaningful signal.

The internal-facing benefit

A clean error code catalog also benefits the team running the API. Monitoring dashboards group failures by code, which surfaces patterns invisible to status-code-only grouping. Customer support routes tickets by code. Engineering on-call sees which code spiked and knows which subsystem to investigate. The investment pays back in operational visibility as well as customer experience.

Across DocuMint, CronPing, FlagBit, and WebhookVault we currently have around 30-40 error codes per product, each documented on its own page, and a CI check that fails if the source emits a code not in the catalog. The catalog has caught several would-be-undocumented codes over the past year. The deeper observation we have come back to is that the error code catalog is the part of API documentation that compounds: every customer integration depends on it, every support escalation references it, every monitoring alert groups by it, and the investment in keeping it accurate pays back across all three audiences for as long as the API exists.

Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) keep the lights on.

Read more