API Design

Designing API Rate Limit Reset Mechanics: Fixed-Window, Sliding-Window, and What Customers See

Rate limits have two implementations that look identical in the documentation and behave very differently in practice. The fixed-window reset is cheap to implement and produces customer-visible cliff edges; the sliding-window reset is more expensive and produces smoother experience.

Anethoth

01 Jun 2026 — 7 min read

The customer-facing description of a rate limit usually looks like "100 requests per minute" or "1000 requests per hour." The phrasing implies a clean window: a request count that resets at the boundary of each interval, with full quota available immediately after the reset. The implementations that produce that visible behavior split into two families, fixed-window and sliding-window, that behave identically when traffic is well-distributed and very differently when traffic is bursty or aligned with the window boundaries.

The fixed-window implementation

The fixed-window implementation aligns the rate-limit counter with wall-clock boundaries. For a 100-per-minute limit, the counter resets at the top of every minute. A request at 12:00:00.001 is the first request of that minute's window; a request at 12:00:59.999 is potentially the hundredth; a request at 12:01:00.001 is the first of the next window. The implementation requires one counter per (account, window) pair, and the counter can be a simple integer in a database row or a Redis counter with TTL.

The fixed-window approach has three properties that matter operationally. First, the implementation is cheap: incrementing a counter is a single Redis command or a single SQL UPDATE. Second, the failure mode is well-understood: when the limit is hit, every subsequent request in that window fails, and the customer's recovery is predictable (wait until the next window). Third, the counter can be reset on schema changes or operator intervention without complex state migration.

The fixed-window approach has one property that bites customers: the cliff edge at the window boundary. A customer who sends 100 requests in the last second of a minute and 100 requests in the first second of the next minute has stayed within the documented 100-per-minute limit at every window boundary but has sent 200 requests in a 2-second span. The behavior is correct relative to the implementation but produces traffic patterns that the rate limit was nominally supposed to prevent. The asymmetric case bites in the other direction: a customer whose request pattern is steady but happens to peak at window boundaries hits the limit unexpectedly because their natural rate is uniform but the counter is not.

The sliding-window implementation

The sliding-window implementation does not align to wall-clock boundaries. Instead, the counter tracks the request count over a moving window that always extends a fixed duration into the past from the current moment. For a 100-per-minute limit, the question at any moment is "how many requests has this customer sent in the last 60 seconds?" The window slides continuously, and the cliff edge of the fixed-window approach disappears.

The sliding-window implementation is more expensive. The naive version requires storing every request's timestamp and counting requests in the window for every check, which is O(n) per check. The optimized version uses two counters (the current window and the previous window) and weighted-by-time-into-current-window arithmetic to approximate the sliding count without storing every request. The approximation is good enough for most practical purposes and brings the per-check cost back to O(1), but the implementation is more complex than the fixed-window counter.

The sliding-window approach has three properties that complement the fixed-window properties. First, the behavior smooths out at boundaries: a customer at the limit at one moment is at the limit a moment later, with the count decreasing as old requests age out of the window. Second, the customer's recovery is gradual rather than binary: as time passes, capacity becomes available proportionally. Third, the implementation is harder to operate during incidents because the counter state is more complex.

What customers see

The customer-visible difference between the two implementations shows up in the headers and in the retry behavior. The X-RateLimit-Reset header on a fixed-window implementation is a discrete value (the timestamp of the next window boundary), and the customer's right response on a 429 is to wait until that timestamp. The X-RateLimit-Reset header on a sliding-window implementation is the time until the customer's capacity returns to one (the oldest request's age plus the window duration), and the value changes continuously.

The retry-after behavior differs in kind. Fixed-window retry-after says "your next attempt should be at the top of the next minute," and a synchronized retry storm is the natural consequence when many customers hit the limit in the same window. Sliding-window retry-after says "your next attempt should be in this many seconds," with the seconds value differing across customers based on their request history, and the retry storm tendency is correspondingly weaker.

The dashboard surface differs. A fixed-window dashboard shows usage as a bar chart with sharp drops at window boundaries; the bars are easy to read but the underlying truth (what is the customer's current rate of consumption) is obscured. A sliding-window dashboard shows usage as a continuous line; the line is harder to read at a glance but the truth is visible. Most B2B SaaS dashboards have settled on the bar-chart approach because the readability advantage outweighs the truth disadvantage for the typical customer use case (am I going to hit my limit this period).

The hybrid approaches

Most production rate-limit implementations are not pure fixed-window or pure sliding-window but variations on the theme. The token bucket algorithm tracks a continuously-refilling token count with a maximum cap, which behaves like sliding-window for steady traffic and like fixed-window for bursty traffic by allowing accumulated tokens up to the cap. The leaky bucket algorithm tracks request rates by passing them through a constant-rate output, which produces sliding-window behavior for the steady-state and rejects bursts above the bucket capacity.

The Redis-based GCRA (generic cell rate algorithm) is a common production implementation that combines token-bucket semantics with fixed memory cost per customer. The Stripe rate-limiter is a token bucket; the GitHub primary rate limiter is closer to fixed-window with custom burst allowances; the AWS API Gateway is a token bucket. The choice between implementations is mostly invisible to customers when the limits are set high enough that customers rarely hit them, and becomes visible at the limit edge where the smoothness of the response shapes customer perception.

The burst allowance question

The customer-facing limit is usually a steady-state rate, but customer traffic is rarely steady-state. The question is whether to allow customers to exceed the steady-state rate for short bursts in exchange for staying below it on average. The token bucket approach gives this for free: the bucket cap determines the burst allowance, and the refill rate determines the steady-state. The fixed-window and sliding-window approaches require explicit burst handling, usually as a separate limit at finer granularity (1000 per minute steady plus 100 per second burst, for example).

The burst allowance interacts with retry behavior. Customers who retry on 429 with backoff naturally produce traffic patterns that test the burst allowance more than the steady-state. A customer that consistently retries at the boundary of acceptable behavior may pass the steady-state limit but fail the burst limit, and the failure manifests as retries that themselves get rate-limited. The right design either documents the burst limit explicitly or sets the steady-state generously enough that the burst behavior does not exceed it under normal retry patterns.

The reset semantics across multi-tier limits

Most APIs have multiple rate limits at different granularities: per-second, per-minute, per-hour, per-day, per-month. The reset semantics across the tiers matter for customer integration. If the per-minute limit is fixed-window and the per-hour limit is sliding-window, the customer sees two different reset behaviors in the same API and has to handle both. The right default is to use the same reset semantics across all tiers, with the choice being either all fixed-window (operationally simplest) or all sliding-window (customer-experience smoothest).

The interaction between tiers also matters. A customer who hits the per-minute limit but has not approached the per-hour limit gets a 429 with the per-minute reset time. A customer who has approached the per-hour limit gets a 429 with the per-hour reset time even when the per-minute counter is well below its limit. The right response in both cases is the most-restrictive limit's reset time, with the X-RateLimit-Scope header indicating which limit fired. The customer-side handling is the same in both cases (honor the Retry-After), but the operational diagnostics depend on knowing which limit fired.

Three patterns that fail

The first failure is hidden window-boundary cliffs. A customer who is told their limit is 100 per minute reasonably assumes the limit is roughly steady. If the implementation is fixed-window, the actual behavior produces cliff edges at the window boundaries that the customer's monitoring picks up as anomalies. The fix is either to use sliding-window or to document the fixed-window behavior explicitly so customers can build their retry logic around it.

The second failure is mismatched reset semantics across tiers. A customer who debugged the per-minute limit handling against a sliding-window implementation gets bitten when the per-hour limit is fixed-window and behaves differently. The fix is to standardize the semantics across tiers, and to document the choice clearly in the rate-limit documentation.

The third failure is inconsistent header semantics. Some implementations return X-RateLimit-Reset as seconds-until-reset, some as Unix-timestamp-of-reset, some as ISO-8601 timestamp. The mismatch produces customer integration bugs that are silent until the wrong-format value is interpreted incorrectly. The fix is to pick one format (Unix-timestamp is the most common production choice, with Stripe and GitHub as reference cases) and document it explicitly with example code.

Our use across the four products

Our four products use a sliding-window implementation backed by a shared rate-limiting service. The choice was made for customer-experience reasons: the products are developer-facing, and developers building integrations against our APIs are sensitive to the boundary behavior in ways that end-user-facing APIs may not be.

The implementation uses two-counter approximation rather than per-request timestamp storage, with the approximation error being well under one percent of the limit for typical traffic patterns. The X-RateLimit-Reset header is returned as Unix timestamp, and the X-RateLimit-Remaining header reflects the current sliding-window count. The Retry-After header on 429 responses includes jitter on the server side (the value reported is randomized within a small window) to spread customer retry timing and reduce synchronized retry storms.

The dashboard surface presents usage as a continuous line chart with the sliding-window count over the recent hour. The chart is harder to read than a bar chart but reflects the truth of the limiter's behavior, and customers who hit limits have a better mental model of what is actually being measured.

The deeper observation is that the rate-limit implementation choice is mostly invisible to customers until they hit the limit, at which point the choice shapes their entire experience of the constraint. The fixed-window approach trades implementation simplicity for customer-visible cliff edges; the sliding-window approach trades implementation complexity for smoother boundary behavior. The right choice depends on customer profile, with developer-facing APIs tending toward sliding-window and end-user-facing APIs tending toward fixed-window. The decision is one of the architectural choices that compounds across the customer relationship, and getting it right early is much cheaper than migrating between implementations later.

Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.

Designing API Rate Limit Reset Mechanics: Fixed-Window, Sliding-Window, and What Customers See

Anethoth

The fixed-window implementation

The sliding-window implementation

What customers see

The hybrid approaches

The burst allowance question

The reset semantics across multi-tier limits

Three patterns that fail

Our use across the four products

Read more

Designing API Webhook Receivers That Survive Replay Storms

The Forgotten History of the Sewing Awl: How a Pre-Industrial Stitching Tool Outlived the Machines That Replaced It

How Dippers Walk Underwater: The Strange Aquatic Adaptations of a Songbird

Postgres Default Privileges: How ALTER DEFAULT PRIVILEGES Solves the Forgotten-Grant Problem