Rate Limiting a Small SaaS: When the App Layer Is Enough

Your SaaS doesn't need a distributed rate limiter. Not yet.

The tutorials assume you're defending against sophisticated clients hammering a high-traffic API. You have one FastAPI app, one SQLite database, and maybe two hundred users. The threat model is different: a misconfigured integration loop, a curious developer, or the occasional misbehaving script.

Here's the stack that fits.

In-Process Counter With TTL

The simplest approach is an in-process dict with expiry. Keep a counter per (IP, endpoint, minute). If it exceeds the threshold, return 429.

`python from collections import defaultdict import time

_counters: dict[str, tuple[int, float]] = {}

def check_rate_limit(key: str, limit: int, window: int = 60) -> bool: now = time.time() count, ts = _counters.get(key, (0, now)) if now - ts > window: count, ts = 0, now count += 1 _counters[key] = (count, ts) return count <= limit `

This is thirty lines including imports. It works perfectly fine in a single-process server. The ceiling: it resets on restart, it doesn't work across multiple workers, and the dict grows without bound unless you periodically expire old keys.

Fix the dict growth with a quick purge on every hundredth call. Fix the multi-worker problem with a SQLite table.

SQLite-Backed Rate Limiting

If you're already using SQLite, add a rate_limits table:

`sql CREATE TABLE rate_limits ( key TEXT NOT NULL, window_start INTEGER NOT NULL, count INTEGER NOT NULL DEFAULT 0, PRIMARY KEY (key, window_start) ); `

On every request, INSERT OR REPLACE with the truncated-to-minute timestamp and count + 1. If the count exceeds the limit, return 429. Wrap in a transaction and it handles concurrent writes correctly under SQLite's serialized writes.

This survives restarts and works across threads. For a single-node app with WAL mode, the write overhead is negligible — a few microseconds per request.

Where the Ceiling Is

The app-layer approach breaks down at roughly fifty requests per second sustained, or when you need limits that survive across multiple servers. It also has a blind spot: it can't rate-limit before the request reaches your Python process, which means a flood of slow loopback connections can still exhaust your thread pool.

That's where the reverse proxy layer earns its place. Caddy's rate_limit directive (via the community module) or nginx's limit_req_zone act before your application code runs. The proxy sees the TCP connection and can drop it immediately — no thread, no handler, no database write.

The right architecture for a small SaaS: app-layer limits for business logic (per-user quotas, per-endpoint fairness), proxy-layer limits for flood protection (connections per second per IP). They solve different problems.

What Not to Do

Don't use per-request database queries for rate limiting if your limits require sub-second windows — the write-per-request pattern works fine at one-per-minute granularity but becomes a bottleneck at ten-per-second. Don't apply a single global limit across all endpoints; your submission form and your healthz endpoint have very different legitimate usage patterns. Don't rely on X-Forwarded-For without validating that your proxy actually sets it; a client that sets its own header can trivially bypass IP-based limits.

The goal isn't an airtight guarantee. It's a reasonable signal that stops the obvious accidents and slows down the determined ones long enough to notice.

---

Building something? Browse indie SaaS projects at [builds.anethoth.com](https://builds.anethoth.com) or follow the blog at [anethoth.com](https://anethoth.com).