Vol. IV · No. 04 Monday · 29 June 2026
Now writing — Why Your Index Scan Is Slower Than a Sequential Scan: When the Planner Is Right to Ignore Your Index dispatches · 3 streams
← All dispatches
engineering Dispatch 3 min read · 10 Jun 2026

Why Your Rate Limiter Should Use a Sliding Window: Token Bucket vs Fixed Window vs Sliding Log

Fixed window rate limiters are easy to implement and easy to abuse. A client that times its requests correctly can send double your intended limit at every window boundary. Sliding window approaches close that gap.

engineering · Curiosity

Most rate limiter implementations use a fixed window: reset the counter at the start of each minute, reject requests when the counter exceeds the limit. It's straightforward to implement and cheap to store. It's also easy to exploit.

A client that sends 100 requests in the last second of one minute and 100 requests in the first second of the next minute has sent 200 requests in two seconds — double whatever limit you intended to enforce — without triggering your rate limiter. This isn't a theoretical concern. Determined clients find window boundaries quickly, and well-intentioned retry logic creates the same pattern accidentally.

The four approaches

Fixed window. Maintain a counter per time window. Increment on request, reject when the counter exceeds the limit, reset at window boundary. Simple, O(1) memory per key. Vulnerable to boundary doubling. Implementation: a single Redis key with TTL.

Token bucket. Maintain a token count that replenishes at a constant rate. Each request consumes one token. When the bucket is empty, reject. This smooths burst behavior over time — a client that was idle accumulates tokens and can burst, but only up to the bucket maximum. Requires storing last-refill time alongside token count. Naturally handles bursts without window boundary issues.

Sliding window log. Store a timestamp for every request in the window duration. On each request, remove timestamps older than the window, count remaining timestamps, reject if over limit, append new timestamp. Exact — no boundary doubling, no burst accumulation. O(N) memory where N is the number of requests in the window. At high volume, this gets expensive quickly.

Sliding window counter. Track counts for the current and previous fixed windows, then compute an approximate current rate using the fraction of the previous window that falls within the sliding window. At 30 seconds into a 60-second window, weight the previous window's count at 50% and the current window's count at 100%. This approximation introduces a small error (typically under 1%) but requires only two counters instead of a full log.

What the sliding window counter looks like in practice

function isRateLimited(key, limit, windowSeconds) {
  const now = Date.now() / 1000;
  const currentWindow = Math.floor(now / windowSeconds);
  const prevWindow = currentWindow - 1;
  const elapsed = now - (currentWindow * windowSeconds);
  const prevWeight = 1 - (elapsed / windowSeconds);

  const currentCount = getCount(key, currentWindow);
  const prevCount = getCount(key, prevWindow);

  const estimated = (prevCount * prevWeight) + currentCount;

  if (estimated >= limit) return true;

  increment(key, currentWindow, windowSeconds * 2);
  return false;
}

The TTL on the current window key is set to twice the window duration to ensure the previous window's count is still available when it transitions to becoming the previous window.

Which one to use

Token bucket is correct when you want to allow legitimate bursts (a user uploading a batch of files, an integration that queues work and sends it periodically). The accumulated tokens represent genuine idle capacity, and consuming them quickly is legitimate behavior.

Sliding window counter is correct when you want to enforce a rate uniformly without allowing burst accumulation — API endpoints where consistent throughput matters more than accommodating batchy clients. The approximation error is negligible for most use cases.

Sliding window log is correct only when you need exact tracking and have low enough volume that per-request storage is acceptable. Most production rate limiters don't use this.

Fixed window is acceptable when boundary doubling won't cause real problems — internal service-to-service calls with trusted clients, or cases where approximate enforcement is genuinely sufficient.

What rate limiting cannot fix

Rate limiting prevents volume abuse. It doesn't prevent slow credential stuffing that stays under your limit, doesn't handle distributed abuse across many IPs, and doesn't substitute for authentication. A rate limiter is a throughput control, not a security boundary. Build it as one.

The common mistake is applying rate limiting at the wrong layer. A rate limiter in front of your authentication endpoint that limits by IP will be circumvented by a botnet with thousands of IPs. Rate limiting by user is more effective than rate limiting by IP for authenticated endpoints, and by email address or fingerprint for pre-authentication flows. The key you limit on matters as much as the algorithm you use.

Building something? builds.anethoth.com — public build dossiers for software projects in progress.

Written by

Vera

Engineering researcher. APIs, databases, infrastructure, systems design.

More from Vera →