Designing API Rate Limit Documentation: How to Tell Customers What They Will Hit

Rate limit documentation is one of the most consistently underdone surfaces in B2B API design. Customers do not need a treatise; they need three numbers and an example. Most APIs deliver neither.

Every API has rate limits, and every API documents them poorly. The standard pattern is a paragraph buried in a "Best Practices" page that mentions a single number like "1000 requests per minute" without explaining what counts as a request, whether the limit is per API key or per organization, whether burst traffic is allowed, what headers communicate remaining quota, what status code fires when the limit is hit, or whether the limit is the same for every endpoint. Customers learn the actual behavior by hitting the limit in production.

The fix is mostly cheap. Most of the work is deciding what to document, not writing the documentation. Across DocuMint, CronPing, FlagBit, and WebhookVault we settled on a documentation pattern after a customer complained that the limits we had were correct but unfindable. The pattern below is what we wish we had started with.

The three numbers customers actually need

The minimum useful rate limit documentation answers three questions:

  1. What is the steady-state limit? (e.g. 60 requests per minute)
  2. What is the burst tolerance? (e.g. up to 100 in a 10-second window)
  3. What happens when I exceed it? (e.g. 429 status with Retry-After header)

The first two together describe the rate-limiting algorithm's behavior. The third tells the customer how to react. Without all three, customers cannot write client code that handles rate limiting correctly, and they will either over-provision retries (causing unnecessary load) or under-provision retries (causing failures the customer perceives as the API being broken).

Per-endpoint variation

If different endpoints have different rate limits, document the variation in a table on the rate-limiting page itself rather than scattered across endpoint reference pages. The customer wants to see all the limits at once when capacity planning.

The table columns we settled on are: endpoint pattern, steady-state limit, burst limit, scope (per API key, per organization, per IP), and notes. Notes is where you put the unusual cases: "limited per target subscription rather than per API key" or "counts each item in a batch as a separate request" or "shared with the batch endpoint's quota."

The scope question

Whether a rate limit applies per API key, per organization, per user, or per IP is one of the most consequential design decisions and one of the most under-documented. The customer using your API needs to know whether spinning up a second integration worker doubles their effective rate (per-IP), keeps it the same (per-API-key or per-org), or interacts in some other way.

The honest default for most B2B APIs is per-API-key for short-window rate limits and per-organization for long-window quotas. Per-IP is operationally fragile because customers behind shared NAT (corporate networks, CDN egress) get unexpectedly throttled. Document the scope explicitly. If you have a complicated mix, document the precedence: "most restrictive limit wins, with scope reported in the X-RateLimit-Scope header."

The headers contract

Standard headers customers expect to see on every response:

  • X-RateLimit-Limit: the current window's ceiling.
  • X-RateLimit-Remaining: how many requests are left in the current window.
  • X-RateLimit-Reset: a Unix timestamp when the window resets.

On a 429 response, also include Retry-After with seconds to wait. This is the only piece of the rate-limiting contract that is genuinely standardized via RFC 6585. Without it, customers will guess wait times and either retry too quickly (compounding the problem) or too slowly (delaying recovery).

Document the headers in the rate-limiting page, not just in error response sections. Customers want to write clients that read the headers proactively to avoid hitting the limit at all.

The example code requirement

Documentation without code examples is documentation that customers will misimplement. The minimum useful example is a small block of pseudo-code showing the polite retry pattern: catch 429, parse Retry-After, wait the suggested time, retry once, give up if still failing. For higher-end APIs, providing this in your SDK or as a copy-pasteable utility removes one of the most common sources of customer integration bugs.

We provide the example in three languages (curl, Python, Node) for each product because they cover roughly 80% of our developer audience. Adding a fourth language has diminishing returns; not having any has clearly negative returns.

The capacity-planning section

For paid plans with higher limits, customers want a section that helps them choose the right plan. The format we settled on is a table mapping plan tier to per-endpoint limits, plus a worked example: "If you generate 50 PDFs per minute on average and burst to 100, the Pro plan is enough." The worked example is what customers actually read.

For overage-billing plans, document the soft-limit behavior explicitly: do you throttle, queue, or charge overage? The wrong answer here generates more support tickets than the rate limits themselves.

The escalation path

For customers whose use case exceeds the documented limits, document how to request higher limits. The format is one sentence: "Email support with your use case and expected request volume; we typically increase limits within one business day for legitimate use cases." If you have an automated request form, link to it. If you do not raise limits for free-tier customers, say so explicitly so that the request does not generate false hope.

What not to document

Skip the implementation details of the rate-limiting algorithm. Customers do not need to know whether you use token bucket, sliding window, or leaky bucket; they need to know how the bucket behaves from their perspective. Skip the internal rate-limit-bypass mechanisms (system accounts, internal APIs). Skip the rate-limiting infrastructure (Redis, in-memory counters, distributed coordination). All of these can change without breaking customer integrations, and documenting them creates a contract that constrains future implementation.

The deeper observation

Rate limit documentation is a special case of the general principle that API documentation should describe the customer-facing contract and the customer-actionable response, not the implementation. The customer cares about behavior under load and recovery from being throttled. The implementation can change as long as the documented behavior stays stable. The teams that document rate limits poorly are usually the teams that also under-invest in error messages, in changelog discipline, and in deprecation warnings: the common failure mode is treating documentation as an afterthought instead of as the actual customer surface area.

Read more