Database Replication Lag: How to Handle It Without Lying to Users

Read replicas are the most common scaling pattern for read-heavy databases, and they introduce a subtle bug that almost every team encounters: a user performs a write, immediately reads, and sees the old data. The classic case is updating a profile setting and being shown the previous value on the very next page load. The bug is not a code bug; it is the inherent consequence of asynchronous replication, where the write went to the primary and the read went to a replica that has not yet caught up.

The patterns that handle this correctly are not exotic. They require an honest accounting of where each query goes, a few well-placed routing rules, and a willingness to acknowledge that some reads should not go to replicas. This piece walks through the patterns that survive production traffic without lying to users about the state of their own data.

Why replication is asynchronous

The first thing to understand is that synchronous replication, where every write blocks until the replica acknowledges it, trades the consistency benefit for a latency cost that is usually unacceptable. Every write now incurs a network round-trip to the replica, and if any replica is slow, all writes are slow. Synchronous replication is correct in the textbook sense and operationally toxic in most production systems. PostgreSQL supports synchronous replication and most teams that try it turn it off after the first time a slow replica makes the primary unwritable.

Asynchronous replication, where the primary commits the write locally and ships it to replicas in the background, has the inverse trade: writes are fast, but reads from replicas can lag the primary by milliseconds to seconds depending on load and network. The lag is bounded in steady state — typical PostgreSQL replication lag under normal load is under 100ms — but under heavy write load or during replica catch-up after a failure, lag can spike to minutes.

The pragmatic position is that asynchronous replication is correct and the lag is a feature of the system rather than a bug. The application's job is to know where to route each query so that the lag does not surprise users.

The read-your-writes pattern

The most important routing rule is that a user who just wrote should read from the primary on subsequent requests, at least for a short window. The rule is called read-your-writes consistency, and it is the difference between a database that feels honest and a database that feels haunted.

The implementation is a routing decision: after a write, the application records a "session-pinned-to-primary" marker for the user, with a TTL longer than expected replication lag (typically a few seconds). For the duration of that TTL, all queries from that user route to the primary. After the TTL expires, queries can fall back to replicas. The result is that the user sees their own writes immediately, while other users (who do not care) get the read scaling benefit of replicas.

The marker is usually stored in the user's session or in a short-lived cache keyed by user ID. The cost is negligible — one extra cache lookup per request — and the benefit is that the most common replica-lag bug is eliminated entirely.

Causality tokens for stricter requirements

Some workflows need stronger guarantees than time-based read-your-writes: a user writes record A, another user immediately reads record A and expects to see the write. This is causal consistency across users, and the time-based pin does not help.

The pattern that does help is causality tokens. After a write, the primary returns the WAL position (in PostgreSQL, the LSN — log sequence number) of the commit. The token is propagated to subsequent reads that need the data. Replicas check whether they have replayed up to that LSN; if not, the query waits or routes to the primary. PostgreSQL supports this via pg_wait_for_replay_lsn in PG14+.

This is a more expensive pattern than the session pin: every write returns a token, every dependent read checks the token, and the token has to be transported between users (typically through a shared resource ID or event). It is the right answer when the consistency requirement is genuinely cross-user and immediate. For most application use cases, the session-pin pattern is sufficient and the causality-token pattern is overkill.

Read-only paths that always go to replicas

The opposite of the always-on-primary case is paths that never need primary-level consistency. Analytics queries, dashboard aggregations, search indexing, batch jobs, reports — these can tolerate seconds or minutes of lag without anyone noticing or caring.

The pattern is to route these queries to replicas explicitly, often to dedicated replicas optimized for the workload. PostgreSQL's pgpool-II or HAProxy-based routing can split traffic by query pattern: the application sets a hint header, the router looks at the path or query type, and routes accordingly. The crude version is just two database connection strings — primary and replica — and the application chooses which to use based on the operation.

The discipline that makes this work is treating "which replica" as an explicit concern in the code. A function that runs analytics queries should explicitly use the replica pool. A function that updates user state should explicitly use the primary. The default should be primary (because mistakes there cause bugs the user notices), with replica use being an explicit opt-in for paths that have been audited.

Lag monitoring is the load-bearing observability

Every replication pattern depends on lag being bounded. The single most important metric for replicated systems is replication lag, measured both in time (seconds behind primary) and in WAL position (how many bytes the replica is behind). PostgreSQL exposes both via pg_stat_replication on the primary and pg_last_wal_replay_lsn on replicas.

The alert thresholds depend on the application. For a system using time-based read-your-writes with a 5-second pin, lag over 4 seconds is dangerous because it means even pinned users might see stale data. For a system using causality tokens, lag over 30 seconds means dependent reads are timing out waiting for replicas. For a pure-analytics workload, lag of 30 minutes might be fine.

The pattern that prevents most lag-related incidents is to test the application's lag tolerance in a staging environment by artificially slowing replication. CronPing can monitor the lag-recovery jobs that fix the slowdown, but the real value is in observing how the application behaves under sustained 30-second lag. The behaviors are usually surprising: timeouts in places nobody expected, error pages that should not exist, retry loops that make the lag worse.

The cache-invalidation interaction

Replication lag interacts unhelpfully with caching. A common bug: write goes to primary, cache is invalidated, next read hits the cache miss, query goes to replica, replica returns stale data, stale data is cached. Now the cache has stale data with full TTL, and even after replication catches up, the cache stays stale.

The fix is to either route post-invalidation reads to the primary (using the same session-pin mechanism), or to invalidate the cache after a delay equal to expected replication lag, or to have the cache include a generation counter that the read query can check against the replica's replay position. The simplest of these is the post-invalidation primary routing: it requires no extra infrastructure and handles the case correctly.

The summary

Read replicas are a powerful scaling tool, and the price they extract is honesty about lag. The patterns that pay the price gracefully — read-your-writes via session pinning, causality tokens for cross-user immediate consistency, explicit replica routing for lag-tolerant queries, lag monitoring as the load-bearing observability, and the special handling of cache invalidation after writes — are not difficult to implement. They are difficult to remember to implement, because the bugs they prevent are intermittent and easy to dismiss as test flakes. The teams that get replication right have all built the discipline of asking, for every query, "where does this go and why," and treating the answer as a real engineering decision rather than a default. DocuMint, CronPing, FlagBit, and WebhookVault all run on a single SQLite database per product, which has the lovely property of zero replication lag because there is no replication. When we eventually graduate to PostgreSQL with replicas, the routing discipline will already be in place — every query in the codebase has an explicit "this is a write" or "this is a read" annotation, and the replica routing will be a configuration change, not a code rewrite. The teams that survive the move to replicas are the ones that thought about it before they had to.