Caching Strategies Every Small API Should Implement
Most caching advice is written for problems you do not have. Here is the practical caching playbook for an API that serves under a million requests a day — and why most of it is just HTTP headers.
Caching is one of those topics where the canonical advice was written for the wrong problem. Most articles you find are about distributed cache invalidation across regions, multi-tier eviction policies, or how some megacorp shaved 11ms off a 15ms request path. None of that matters when you are running a single-VPS API with a few thousand customers.
The actually useful caching playbook for a small API is much shorter, and most of it does not involve a cache at all. Here is what I have learned from running four production APIs that together handle a few hundred thousand requests a day.
The first cache is your HTTP response
Before you reach for Redis, look at the headers your API is sending. Cache-Control, ETag, and Last-Modified are the most underused performance tools in web development. They let the browser, the proxy, and the CDN do your caching for free, with no infrastructure to manage and no invalidation bugs to debug.
For an endpoint that returns reference data — list of timezones, list of countries, schema definitions, public configuration — set Cache-Control: public, max-age=86400 and walk away. The browser will not call you again for a day. If a CDN is in front, neither will it.
For an endpoint that returns user-specific data, use Cache-Control: private, max-age=60, must-revalidate with an ETag. The client makes a conditional request with If-None-Match, you check the hash, and if nothing has changed you return 304 Not Modified with no body. Bandwidth saved, server work skipped, no cache infrastructure needed.
Database query caching is usually wrong
The instinct after profiling a slow endpoint is to wrap the query in a cache. Resist it. The right answer is almost always to fix the query.
Add the index. Move the join into a subquery. Materialize the aggregate. Denormalize the column. These fixes are permanent. A cache is a workaround that adds invalidation complexity and a new failure mode (stale data).
Cache the query result only when (a) the query genuinely cannot be made fast enough — for example, full-text search across millions of rows — and (b) the data tolerates being slightly stale. Most CRUD queries do not satisfy either condition.
In-process caches beat external caches at small scale
If you do need an application cache, start with an in-process LRU. Python has functools.lru_cache. Go has any of a dozen LRU libraries. Node has lru-cache. They are zero-dependency, lock-free under typical loads, and as fast as memory access.
The downside is that they are per-process: if you have three workers, you have three caches, and a cache miss in one does not benefit the others. For most small APIs, this is fine. The cost of a cache miss is one extra database query — not a system-degrading event.
You graduate to Redis when (a) you have multiple machines, (b) you need the cache to outlive a process restart, or (c) you need cache primitives that LRU does not give you (TTL per key, atomic increments, sorted sets). Until then, an in-process cache is simpler, faster, and one fewer thing to monitor.
Negative caching catches surprising bugs
If a key does not exist, cache that fact too. Otherwise a hot lookup for a missing record will hit the database every time. The classic case: a user enumeration script probing for emails that do not exist. Without negative caching, every request hits the database; with it, the second probe returns from cache.
Set a short TTL on negative cache entries — 30 seconds is usually enough — so newly created records become visible quickly.
Idempotency keys are a kind of cache
For write endpoints, you want at-most-once semantics under retry. The pattern: the client sends an Idempotency-Key header, you cache the response under that key for some window (often 24 hours), and on a duplicate request you return the cached response without re-executing the operation.
This is a write cache, but the rules are different. You must store the full response, not just the result. You must store it before returning, not after. And you must include the request hash, so two different requests with the same key fail loudly instead of returning the wrong answer.
Cache the expensive thing, not the whole response
If an endpoint does ten cheap things and one expensive thing, cache the expensive thing alone. Wrapping the entire endpoint in a cache means small changes to other parts of the response invalidate the cache pointlessly.
For example, an endpoint that returns user data plus a freshness-sensitive activity feed should cache the user record (changes rarely) separately from the feed (changes constantly). The composite response assembles fresh.
The cache that hurt us most was the one we forgot
The cache failure I have seen most often in small APIs is not "cache miss rate too high" or "memory blowout." It is "we cached this six months ago, and now we cannot figure out why this user is seeing stale data."
The remedies: (a) put every cache behind a single named function, never inline. (b) Log cache hits and misses with the cached key, at debug level. (c) Have an admin endpoint to invalidate by key prefix. (d) Set TTLs on everything, even data you think is permanent.
If you only do one of these, do the last one. A cache with no TTL is a memory leak waiting to become a bug report.
What we run in production
Across our four APIs — DocuMint for PDF generation, CronPing for cron monitoring, FlagBit for feature flags, WebhookVault for webhook capture — the caching footprint is:
Cache-Controlheaders on every static and reference endpoint- An in-process LRU on hot lookups (rate-limit counters, key validation)
- Idempotency-key support on payment-adjacent write endpoints
- No Redis. No Memcached. No external cache infrastructure.
This handles current traffic with database CPU under 5%. We will reach for Redis when we need to. We have not needed to.
Premature caching is a real cost
Caching adds correctness risk. A bug in your cache layer can serve wrong data to paying customers, and stale-data bugs are notoriously hard to reproduce because they depend on timing. Every cache layer is a potential source of "it worked on my machine."
The right time to add a cache is when you have evidence that the underlying operation is the bottleneck and cannot be made faster, and when staleness is acceptable for the read in question. Until then, the cache you do not have cannot lie to your customers.