Feature Flag Evaluation at the Edge: Latency, Consistency, and the Honest Trade-Offs

Feature flag evaluation at the edge — meaning at a CDN's edge nodes or at a serverless function close to the user — has become the marketing pitch for nearly every flag platform in the last five years. The promise is straightforward: instead of asking your central flag service "is feature X enabled for user Y?" on every request and paying the latency, you fetch the rule set once, push it to the edge, and answer the question locally in microseconds. The numbers in the marketing decks are real. The trade-offs are also real, and most teams reach for the edge before they need it.

This piece walks through what edge evaluation actually buys you, what it costs, and the patterns that make it work in production rather than in benchmarks.

What edge evaluation actually buys

The latency win is the headline. A central evaluation call typically costs 20-100 ms depending on geography, network conditions, and whether the flag service is in the same region as the calling service. If you evaluate a flag inside a hot path — a request handler, a page render, an API endpoint — that latency is paid by every user on every request. At the edge, the same evaluation is typically under a millisecond, because the rule set is already in memory.

The second win is reliability. If your flag service is in us-east-1 and your application servers are in eu-west-1, every flag evaluation is a cross-region round trip. If us-east-1 has a bad day, your flag service goes down, and now you have a choice between failing closed (which can take down everything that touches a flag), failing open (which can release in-progress features), or caching aggressively (which is what edge evaluation already is). Edge evaluation makes the cache the source of truth at the edge, so a central outage stops being a hot-path failure.

The third win, less often mentioned, is cost. If you have 100 million flag evaluations per day and your central service charges per evaluation, edge evaluation collapses that bill to whatever it costs to push rule updates plus the overhead of the edge runtime. For high-throughput applications, this can be the dominant economic argument.

What edge evaluation costs

The cost is consistency. The moment you push a rule set to the edge, you have created a distributed cache invalidation problem. When you toggle a flag in your central dashboard, that toggle has to propagate to every edge node before any user sees the change. The propagation takes time. For typical CDN-based edge platforms, full propagation takes 10-60 seconds. For some serverless platforms, it can be longer if the rule fetch is on a TTL rather than push-based.

This means a flag toggle that you intend as instantaneous is actually a window during which different users see different states of the flag. For most feature flags this is fine — a 30-second window of inconsistent rollout for a UI experiment is below the noise floor. For some flags it is not fine. A kill switch that has to disable a buggy feature immediately is exactly the case where you do not want a 30-second propagation window.

The honest answer is that edge evaluation works for slow-changing flags and is wrong for fast-changing ones. A useful split is to evaluate at the edge by default and to provide a "force evaluate centrally" path for flags marked as kill-switches or for evaluations where the calling code knows it needs current truth.

The targeting rule problem

The simplest edge evaluation cases — boolean flags, percentage rollouts based on a hash of the user ID — are trivially edge-friendly. The rule set fits in a few kilobytes, and the evaluation is a hash and a comparison.

The harder cases involve targeting rules that depend on user attributes. "Enable feature X for users in the EU who are on a paid plan and signed up after January 1." Now the edge needs to know, for each request, the user's country, plan, and signup date. If those attributes are passed in the request (via JWT, headers, or a session lookup), edge evaluation works fine. If they require a database lookup to get, then your "edge" evaluation is now an edge-side database call, which defeats most of the purpose.

The pattern that scales is to push attributes into the edge along with the rule set: for each user, a small JSON blob of evaluation-relevant attributes that lives at the edge and is updated on a schedule. This works as long as the attribute set is small and the user count is small enough to fit. For a B2B product with 10,000 customers each having 50 attributes, this is a few megabytes per edge node, which is fine. For a B2C product with 100 million users, it is not, and you are back to passing attributes in each request.

Audit trails and analytics

Central flag evaluation has a useful side effect: every evaluation goes through the central service, which can record which user saw which flag value at which time. This is essential for analytics, A/B testing, and post-incident debugging — "did this user see the new feature when the bug occurred?"

Edge evaluation breaks this for free. Every edge node has to ship its evaluation events back to a central collector, and the collector has to handle the firehose. For high-throughput applications, this can be more data than the original evaluation traffic. The pattern is to sample — record 1% of evaluations or only evaluations of flags marked as analytics-relevant — and to accept that you have lost full audit-trail visibility for the rest. For some teams this is fine; for compliance-heavy environments it is a problem worth being explicit about.

Where this fits in our stack

Across our four products — DocuMint, CronPing, FlagBit, and WebhookVault — we evaluate flags centrally on the same machine that runs the application. The latency cost is essentially zero because the call never leaves the host. We do not need edge evaluation, and we have not built it. FlagBit itself supports both modes: a central API for evaluation and a downloadable rule-set bundle for clients that want to evaluate locally. The downloadable bundle is the right answer for many teams; the edge is the right answer for fewer than the marketing suggests.

The deeper lesson is that the latency-versus-consistency trade-off is the central trade-off in distributed systems, and feature flags are a small enough surface that you can make a clean choice rather than reaching for the most complex solution by default. If your flag evaluation is taking measurable percentages of your request budget, push it to the edge. If it is not, keep it central, keep your audit trail intact, and spend your complexity budget somewhere else.