Engineering

A Practical Guide to API Rate Limiting

Rate limiting is one of those features that seems simple until you try to build it well. The basic idea — restrict how many requests a client can make — is straightforward. The details are where it ge

Anethoth

18 Apr 2026 — 2 min read

Why Rate Limit

Three reasons, in order of importance:

Protect your infrastructure. A single client making 10,000 requests per second can take down your API for everyone. Rate limiting prevents one bad actor (or one buggy integration) from becoming an outage.
Enforce fair usage. Your pricing tiers promise different levels of access. Rate limiting is how you deliver on that promise without manually monitoring every account.
Signal quality. APIs without rate limits feel amateur. Rate limits, paradoxically, make your API feel more professional — they signal that other people are using it and that you care about uptime.

The Three Algorithms

Token Bucket

Imagine a bucket that holds N tokens. Each request removes one token. Tokens refill at a fixed rate. When the bucket is empty, requests are rejected until tokens refill.

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.last_refill = time.time()

    def allow(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.refill_rate)
        self.last_refill = now
        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False

Pro: Allows bursts up to the bucket capacity. A client with a capacity of 100 can make 100 requests instantly, then waits for refill. This matches real-world usage patterns — most API clients send bursts, not steady streams.

Con: Slightly more complex to implement correctly in distributed systems.

Fixed Window

Count requests in fixed time windows (e.g., per minute). Reset the counter at the start of each window.

Pro: Dead simple. One counter per key per window.

Con: Boundary problem. A client can make 100 requests at 11:59:59 and another 100 at 12:00:00 — effectively 200 requests in 2 seconds while staying within a "100 per minute" limit.

Sliding Window

Weighted combination of current and previous window counts. Approximates a true sliding window without storing individual request timestamps.

# Weighted rate = previous_count * overlap + current_count
# overlap = (window_size - elapsed_in_current) / window_size
previous_count = get_count(key, previous_window)
current_count = get_count(key, current_window)
elapsed = time.time() % window_size
overlap = (window_size - elapsed) / window_size
effective_count = previous_count * overlap + current_count

Pro: Smooth. No boundary spikes. Low memory.

Con: Approximation, not exact. Usually close enough.

What We Use

At Anethoth, all four of our APIs use a simple per-key-per-minute counter stored in memory (we're single-process Python apps with SQLite). When we scale to multiple processes, we'll move to Redis-backed sliding windows.

The headers we return on every response:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1713456000

When rate limited, we return 429 Too Many Requests with a Retry-After header. The error message includes the limit, the reset time, and a link to pricing for higher limits.

Common Mistakes

Rate limiting by IP address alone. Shared IPs (corporate offices, VPNs, cloud providers) mean many users share one limit. Always rate limit by API key when possible, with IP-based limiting as a fallback for unauthenticated endpoints.
Silent rate limiting. Dropping requests without explanation is hostile. Always return clear 429 responses with headers that tell the client when they can retry.
Overly aggressive limits. If your free tier allows 10 requests per minute, you will spend more time handling support tickets about rate limiting than you save in server costs. Be generous with limits, strict with enforcement.

Rate limiting is infrastructure that your users should almost never think about. If they notice it, your limits are too low or your errors are too opaque. The best rate limiter is invisible.

A Practical Guide to API Rate Limiting

Anethoth

Why Rate Limit

The Three Algorithms

Token Bucket

Fixed Window

Sliding Window

What We Use

Common Mistakes

Read more

How Manatees Sense Currents: The Strange Tactile Engineering of Hydrodynamic Vibrissae

The Forgotten History of the Steam Locomotive: How the Iron Horse Compressed Geography

Postgres pg_class and pg_attribute: Reading the System Catalogs Directly

Designing API Webhook Delivery Receipts: The Audit Trail Customers Build Reports From