Caching at the Edge: When CDN Rules Help and When They Hurt

An edge cache is a strange creature. It is not under your control in the way your application server is. It does not always tell you when it has cached something. It does not always honor the headers you send. It is operated by another team, often by another company, and the tools you have for inspecting its behavior are limited. And yet it is the single highest-leverage performance lever in most applications, because the request that never reaches your origin is the fastest request you can serve.

The promise of edge caching is simple: put a cache in front of your origin, set a few headers, watch the latency drop and the bandwidth bill shrink. The reality is that the headers interact in ways that are not obvious, and the failure modes when you get them wrong range from poor cache hit rates (which is annoying) to serving authenticated content to the wrong user (which is a security incident). It is worth getting the model right.

The three directives that matter most

The three Cache-Control directives that do almost all the work are max-age, s-maxage, and private versus public. Everything else is a refinement on top of these.

max-age tells any cache (browser or CDN) how long to keep the response fresh, in seconds. s-maxage overrides max-age specifically for shared caches like CDNs, allowing you to cache longer at the edge than in the user's browser. private tells caches that the response is for a single user and should not be stored in shared caches; public explicitly allows shared caching even for responses that would otherwise be considered private (typically those with an Authorization header).

The most common mistake is sending Cache-Control: max-age=3600 on an authenticated API response without realizing that some CDN configurations will treat this as cacheable in their shared cache. The fix is explicit: Cache-Control: private, max-age=3600, or, more often, Cache-Control: no-store. The CDN should never have to guess.

The Vary header is load-bearing

The Vary header tells the cache which request headers should be part of the cache key. If your response varies by Accept-Encoding (because you sometimes serve gzip and sometimes plain), Vary should include Accept-Encoding. If your response varies by Accept-Language, Vary should include Accept-Language. If you forget to set Vary, the cache will happily serve a gzip response to a client that did not advertise gzip support, and that client will get garbage.

The non-obvious case is when the response varies by a custom header you have invented. If your API returns different content based on an X-API-Version header, the response must include Vary: X-API-Version or the cache will serve the v1 response to a v2 caller (or vice versa). The penalty for forgetting Vary on a custom header is data corruption, not just stale content. It is one of the most insidious bugs in caching because it manifests intermittently and only on cache hits.

Stale-while-revalidate

The directive that most operators discover late and wish they had used earlier is stale-while-revalidate. It tells the cache that, after the response is no longer fresh, the cache may serve the stale version while it asynchronously fetches a fresh copy. The user gets a fast response from cache; the cache gets updated; the next user gets the fresh response.

The practical effect is that origin load drops dramatically. Without stale-while-revalidate, every cache miss triggers a synchronous origin fetch and the user waits for it. With stale-while-revalidate, the only synchronous origin fetch is the very first one; everything after that is served from cache while the cache refreshes in the background. The trade-off is that some users will see slightly stale data, by definition. For most applications (product pages, marketing pages, API documentation, public listings) this is the correct trade.

The directive looks like Cache-Control: max-age=60, stale-while-revalidate=300, which means "treat as fresh for 60 seconds, then serve stale for up to 5 more minutes while revalidating." Tuning the two numbers is the kind of thing you adjust based on how dynamic the data is.

Cache invalidation

The hard problem in caching is invalidation. Every CDN provides a purge API; the question is whether you call it correctly when the underlying data changes. The honest answer for most operators is "sometimes." Purge endpoints are fired from application code on writes, the application code occasionally fails to fire them, and stale data persists until the TTL expires.

The pattern that is most reliable is to make the cache key include a version identifier that you bump on writes. If your product page is cached at /products/123 with a long TTL, you can prepend a version: /v42/products/123. When the product changes, you bump v42 to v43, and the new URL is a cache miss while the old URL just expires naturally. This is more work than calling purge, and it sidesteps a whole class of "did the purge fire?" bugs.

The version can come from a build artifact (good for static assets), a database revision (good for content that changes via the application), or a content hash (good for assets that change at unpredictable times). The principle is the same: cache invalidation is impossible to get right reliably; cache-key rotation is much easier.

What not to cache

The temptation when you have an edge cache available is to cache everything. The discipline is to cache only what is actually safe to cache. Authenticated responses are not safe to cache by default; even if your CDN is configured to honor private/public correctly, one misconfigured route can leak across user boundaries. Responses that depend on geolocation are not safe to cache at the edge unless you are using an edge that understands geo-keying. Responses that include CSRF tokens are not safe to cache.

The principle is to cache things that are truly the same for all callers (or for all callers who share the relevant Vary header values), and to be paranoid about anything that is per-user. The simplest rule that works in practice: cache GET responses that have no authentication context, and do not cache anything else.

Observability

The edge cache is invisible by default. Most CDNs provide a header on the response that tells you whether the request was a cache hit or miss; check for CF-Cache-Status on Cloudflare, X-Cache on Fastly and CloudFront, similar headers elsewhere. Add a step to your CI that checks the cache hit rate on key endpoints over time; a sudden drop usually means someone changed Vary or Cache-Control without realizing the impact.

The logs from the CDN itself, if you can get them, are the ground truth. Most CDNs provide log delivery to S3 or similar, with per-request fields including cache status, response time, and whether the request was served by the edge or proxied to origin. Sampling these logs and computing hit rate by URL pattern is the simplest possible cache observability and it catches the majority of real problems.

The four APIs we run at DocuMint, CronPing, FlagBit, and WebhookVault use Caddy as their HTTP layer with Cloudflare in front. The static assets and marketing pages have aggressive cache headers; everything that touches authentication is explicitly marked no-store. The cache hit rate is something we check after every deploy, not because we expect it to drop, but because the cost of finding out two weeks later that we accidentally disabled caching is higher than the cost of a one-line deploy check.