Why Your Caching Layer Has a Thundering Herd Problem: Expiry Stampedes and How to Prevent Them

Your cache hit rate is 99.7 percent. One day, a key expires. In the 800 milliseconds before the cache is repopulated, you get 400 requests for the same object — and each one, seeing a cache miss, fires a database query. You've just DDoS'd your own database with legitimate traffic.

This is the expiry stampede, also called cache thundering herd. It's distinct from the retry thundering herd (many clients retrying a failed service simultaneously). The mechanism is different, and so are the mitigations.

Why It Happens

The standard cache pattern is: check cache, miss, query DB, store result. Under low concurrency, this is fine. A single miss generates a single DB query. But under high concurrency against a single hot key, the 10 milliseconds between the first miss and the cache write is enough time for hundreds of parallel requests to also check the cache, also see a miss, and also fire DB queries. All of them return the same data. All of them write the same value to cache. Most of that work was redundant.

The worst case is a key that's popular enough that it never would have expired naturally — but its TTL was set at 3600 seconds, and 3600 seconds ago it was first created. When it expires, the thundering herd is proportional to the request rate at that moment.

Three Mitigation Patterns

1. Probabilistic Early Expiration (XFetch)

The XFetch algorithm, described by Vattani et al. (2015), solves this without locks. Each reader independently decides whether to refresh the cache based on a probability that increases as the key approaches expiry:

def get_with_xfetch(key, beta=1.0):
    value, expiry, delta = cache.get_with_metadata(key)
    # delta = time it took to compute the value last time
    # expiry = absolute Unix timestamp of expiry
    now = time.time()
    remaining = expiry - now
    # Refresh with increasing probability as expiry approaches
    if -delta * beta * math.log(random.random()) > remaining:
        value = recompute(key)
        cache.set(key, value, ttl=NEW_TTL, delta=recompute_time)
    return value

The probability of early refresh is near zero when the key is fresh and rises to near-certain as expiry approaches. Under high concurrency, a few requests will refresh slightly early, spreading the recomputation over time rather than concentrating it at expiry. No coordination required. Works well when you have thousands of keys with varying TTLs and can't predict which ones are hot.

2. Lock-Based Single-Flight

For a small number of known-hot keys, a distributed lock prevents the stampede:

def get_with_lock(key, ttl=3600):
    value = cache.get(key)
    if value is not None:
        return value

    lock_key = f"lock:{key}"
    acquired = redis.set(lock_key, "1", nx=True, ex=5)  # 5s lock timeout

    if acquired:
        try:
            value = db.query(key)
            cache.set(key, value, ttl=ttl)
            return value
        finally:
            redis.delete(lock_key)
    else:
        # Didn't get the lock — wait briefly and retry from cache
        time.sleep(0.05)
        return cache.get(key) or db.query(key)  # fallback if still missing

One request repopulates the cache; the others wait and then read the freshly populated value. The SETNX (set-if-not-exists) operation in Redis is atomic. The 5-second lock expiry is a safety valve in case the lock holder crashes before deleting the key.

The tradeoff: under extreme concurrency, the "wait and retry" path still hits the DB if the lock holder is slow. You can serve stale value instead of waiting — keep the old value around beyond its TTL, serve it while the lock holder refreshes in the background.

3. Background Refresh (Never-Expire Pattern)

For critical-path data where staleness is acceptable within a defined window:

# Store value with an internal soft-expiry but no Redis TTL (or very long TTL)
cache.set(key, {value: data, soft_expiry: now() + 300}, ttl=600)

def get_with_background_refresh(key):
    entry = cache.get(key)
    if entry and entry.soft_expiry < time.time():
        # Trigger async refresh, but serve current value now
        task_queue.enqueue(refresh_key, key)
    return entry.value if entry else db.query(key)

The key never has a cold expiry moment. Refresh happens asynchronously before the value is considered stale by callers. Requires knowing which keys are hot enough to maintain — this pattern only works for a defined set of critical keys, not broad cache.

What Doesn't Work

Jittering TTLs: Randomizing TTL values at cache write time (e.g., TTL = base_ttl + random(0, 300)) helps when the stampede is across many keys expiring simultaneously — a bulk-loaded cache where everything was written at the same time. It does not help with a single hot key. When that key expires, the stampede still happens — it just happens at a randomized time.

Longer TTLs: Delays the problem. The stampede still happens when the key eventually expires, and now the cached value is older. If the data changes frequently, longer TTLs mean you're serving staleness in exchange for stampede prevention — a real tradeoff, but not a solution.

Which Pattern for Which Problem

The right mitigation depends on the shape of your hot-key problem:

Broad cache, many keys, unpredictable hotness: Probabilistic early expiration (XFetch). No coordination overhead, works at scale, handles the long tail.
Small number of known-hot keys (< 50): Lock-based single-flight. Simple to reason about, easy to monitor (watch lock contention), serves stale-on-miss gracefully.
Critical-path data where a cache miss is unacceptable: Background refresh with never-expire. The highest operational complexity, but eliminates cold misses entirely for the keys that matter most.

Most production systems benefit from lock-based single-flight for their five to ten most critical keys, and probabilistic expiration or jitter for the rest. The truly never-expire pattern is usually overkill except for data that appears in every page load — user session data, global configuration, front-page content.

Building in public? builds.anethoth.com is a directory of software projects with public build dossiers — shipped milestones, known limitations, and proof the work is real.