Designing API Retry-After Headers: When to Tell Customers to Slow Down, and How Specific to Be
The Retry-After header is one of the underused communication channels between API and client. Used well, it converts a 429 or 503 from an interruption into a coordinated handoff. Used poorly or omitted, it produces retry storms that make the situation worse.
The Retry-After header is one of the underused communication channels between API and client. The status codes that warrant retry (429 Too Many Requests, 503 Service Unavailable, sometimes 502 and 504) are returned by every API at some rate, and the question of how the client should respond is left to the client by default. Retry-After is the standardized way to tell the client what the API thinks the right response is. Used well, it converts a temporary failure from an interruption into a coordinated handoff. Used poorly or omitted, it leaves clients to guess, and the guesses are often wrong in ways that make the situation worse.
What Retry-After does
The HTTP spec (RFC 7231) defines Retry-After as a response header that indicates how long the client should wait before retrying. It accepts two formats: a non-negative integer number of seconds (the common case) or an HTTP-date timestamp (for rare cases where a fixed resume time is known). The semantics are advisory; the client is not required to honor the value, but well-behaved clients do.
The header is most commonly returned with 429 Too Many Requests to indicate when the rate limit window will reset, and with 503 Service Unavailable to indicate when the service is expected to be back. It is also valid with 301 and 3xx redirects (uncommon in API contexts), with 502 Bad Gateway and 504 Gateway Timeout when the upstream is temporarily unavailable, and with custom situations where the server knows the client should not retry immediately.
The three cases for Retry-After
The first case is the deterministic case: the server knows exactly when the client can succeed. Rate limits with a fixed reset window are the canonical example. If the rate limit resets at the top of each minute, and the current time is 12:34:42, the client cannot succeed for 18 more seconds; the right Retry-After value is 18. If the rate limit is a sliding window, the right value is the time until the oldest request in the current window ages out enough to allow another. In both cases the server can compute the value exactly.
The second case is the predictable case: the server knows roughly how long the situation will last, even if not exactly. Planned maintenance is the canonical example; if the maintenance window is scheduled to end at a known time, the right Retry-After is the seconds remaining until that time. A backend service that is restarting has a roughly predictable recovery window (10-30 seconds for most services). A circuit breaker that is open has a recovery interval matched to the failure mode (commonly 30-60 seconds). In each case, the server can give a reasonable estimate, even if not perfect.
The third case is the indeterminate case: the server does not know when the client can succeed. Cascading failures, degraded mode under load, third-party dependencies down, novel error conditions. The wrong answer is to make up a number; the right answer is to omit Retry-After and let the client fall back to its default backoff policy. The header is meant to give the client information it does not have; if the server does not have the information either, the header is worse than useless.
The retry-storm failure mode
The retry-storm pattern is what happens when many clients receive errors simultaneously, all back off by the same amount of time, and all retry at the same instant, producing a synchronized load spike that re-triggers the failure. The classic example is a Kubernetes-style fleet of pods losing connection to a database and all retrying with the same fixed backoff; the database recovers, sees a flat-spike of retries, and falls over again.
Retry-After is the server-side mitigation for retry storms when the server returns a fixed value. If every 429 response carries Retry-After: 60, every client that received a 429 in the past 60 seconds will retry at almost the same moment 60 seconds from when they received it. The synchronization across clients depends on whether the original requests were synchronized, but in production environments, fleets of similar clients often have correlated request timing, so the retry timing is also correlated.
The mitigation is to add server-side jitter to the Retry-After value: instead of returning a constant 60 for all clients, return 50 + random(0, 20). The clients now spread their retries over a 20-second window, breaking the synchronization. The client side should also add its own jitter on top of the server-suggested value, as belt-and-suspenders for the case where the server forgot. The combination produces well-distributed retry traffic across the recovery window.
The Retry-After-and-rate-limit-header pair
For 429 responses, Retry-After is usually returned alongside the X-RateLimit family of headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset). The headers serve different purposes: X-RateLimit-Reset tells the client when the window resets in wall-clock time (Unix timestamp), Retry-After tells the client how many seconds to wait. The two should be consistent; if the rate limit resets at Unix timestamp T, and the current time is T-30, Retry-After should be 30.
The advantage of returning both: the client can pick the format it understands. Older clients only understand Retry-After; newer clients that want to display "rate limit resets in 30 seconds" use X-RateLimit-Reset directly. The cost of returning both is minimal (a few extra header bytes), and the compatibility benefit is meaningful.
The client side of the contract
The client-side discipline that makes the Retry-After contract work: honor the server-suggested value as a floor (do not retry sooner), but bound it with a sanity check (do not retry on an unrealistically long suggested value). The sanity check matters because servers occasionally return mistakes. A Retry-After: 31536000 (one year) is almost certainly a server bug; the client should cap the value at something reasonable (commonly 5 minutes for foreground operations, 1 hour for background jobs) and fall back to its default error handling rather than waiting a year.
The client should also distinguish between Retry-After honored versus Retry-After ignored in its logging. If the API consistently returns Retry-After: 60 and the client takes longer to retry than that, the operational cost is just slightly slower recovery from rate limits. If the client retries faster, it makes the rate-limit situation worse and may trigger additional throttling. The asymmetry is why client libraries should default to honoring server suggestions and require explicit opt-out for the cases where the client knows better.
Across our four products
We return Retry-After on all 429 responses across DocuMint, CronPing, FlagBit, and WebhookVault. The value is computed from the rate limit window: for a per-minute window, the server returns the seconds remaining until the window resets, with 5-20 seconds of random jitter added to prevent synchronized retries. For 503 responses (which we return only during the brief restart window when a container is recycling), we return Retry-After: 30 as a conservative estimate of restart time.
The deeper observation is that Retry-After is a contract between server and client about coordinated recovery. The contract works because both sides agree to behave sensibly; the server commits to giving useful values, and the client commits to honoring them. When either side breaks the contract, the cost falls on the other side: a server that returns wrong values produces avoidable client failures, and a client that ignores reasonable values produces avoidable server load. The pattern recurs across HTTP headers in general: the standards make space for cooperation, and the cooperation pays back in proportion to how seriously both sides take the contract.