The pitch for read replicas is simple and seductive. Your primary database is bottlenecked on read load. Spin up a replica, point your read traffic at it, and the primary breathes again. The capacity problem is solved with a single configuration change. Your application stays the same.
The reality is that read replicas introduce a specific class of bugs that did not exist before, and the bugs are subtle enough that most teams adopt replicas, ship them, and only discover the problems weeks later in support tickets that read like ghost stories. The user posted a comment and it did not appear. The order showed up in their email but not in their account history. The setting they just saved still shows as off.
None of these are bugs in the classic sense. The data is correct. It is just at a different point in time on different replicas, and the application has not been told that it is reading from a time-travel device.
What replication lag actually is
A read replica is, mechanically, a second database that subscribes to the write stream of the primary and applies those writes in order. Under any real workload, the replica is behind the primary by some amount. In healthy systems with light load, this is microseconds to single-digit milliseconds. Under heavy write load, network problems, or replica catch-up after a restart, lag can climb to seconds or minutes.
The lag is asymmetric: the primary always reflects the latest write. The replica reflects the latest write that has finished crossing the network and being applied locally. The gap is the failure surface.
The read-your-writes problem
The most common bug pattern is read-after-write. A user POSTs to create a comment, the application writes to the primary, returns 201, and the client immediately fetches the comment list. The fetch goes to a replica. The replica has not yet seen the write. The list comes back without the new comment.
This is not a corner case. Modern web frontends almost always do this: they write something, then re-fetch the page state to update the UI. With a primary-only setup, this works because the read sees the write trivially. With a replica, it works only if the replica has caught up before the read arrives. In a tight loop, that is unlikely.
The user-visible result is a phantom UI: the comment they just posted is missing from the list, the form clears, they re-submit, and now there are two comments. They contact support.
Three patterns that work
Sticky reads after writes. The simplest fix: when a request writes to the primary, the same session reads from the primary for some time window (typically 1-5 seconds, longer than your maximum observed lag). After the window, reads can go to replicas safely. Implementation: a per-session flag, a cookie, or a timestamp the application carries. This pattern is what most ORMs and database proxies have settled on.
Read-from-primary for sensitive paths. Some queries cannot tolerate any staleness: anything in the auth path, account state for billing decisions, anything where stale data could cause double-charging or inconsistent permissions. Mark these queries explicitly to bypass replicas. This is a per-query decision, not per-table; the same table might be read from a replica for a list view and from the primary for a checkout decision.
Causality tokens. The most general solution: when the application writes, the database returns a position token (an LSN in PostgreSQL, a GTID in MySQL). The application stores it for the session. On subsequent reads, the application sends the token to the replica with the query. The replica waits until it has caught up to that position before responding. PostgreSQL implements this directly with pg_wait_for_replay_lsn; some MySQL distributions have similar primitives.
Where replicas actually help
Replicas earn their keep on workloads where staleness is acceptable: analytics queries, reporting dashboards, search indexing, background jobs, logged-out browsing, public profile views, and product catalog reads. These are the bulk of read traffic in most applications, and routing them to replicas frees the primary for the writes and the read-your-writes-sensitive queries.
Replicas do not help when the read workload is small but write workload is large; that needs sharding or a different storage approach. They do not help when most reads are cache-hot; the cache layer is doing the work. They do not help with workloads where every read is sensitive to staleness; you end up routing everything to the primary anyway.
Operational hygiene
Monitor lag obsessively. Replication lag is the single most important number. Alert if it crosses a threshold (start with one second). The metric is available in PostgreSQL via pg_stat_replication.replay_lag and in MySQL via SHOW REPLICA STATUS's Seconds_Behind_Source.
Bound the read-after-write window with the lag. If your lag P99 is 800ms, your read-after-write window should be at least that. If lag spikes to 30 seconds during a backup, every session that writes during the spike will have a stale-read window that exceeds normal expectations. Plan for the spike.
Have a flag to disable replica reads. When lag explodes (always at the worst time), you want to route everything back to the primary. The primary may struggle, but it will at least be correct. A feature flag at the data-access layer that reroutes reads is worth its weight in pages.
Test with artificial lag. In staging, inject 5-second replication delay (or pause the replica's apply thread). Run your full integration suite. Find every read-after-write bug before production does.
Treat the replica as eventually-consistent storage, not a faster primary. The mental model "the replica is a copy of the primary" is wrong in a way that produces these bugs. The correct model is "the replica is an eventually-consistent view that lags the primary by an unbounded amount during failures."
The deeper trade-off
Replicas are an instance of a general pattern in distributed systems: capacity is bought with consistency. Every cache, every CDN, every queue, every replica is the same trade. The system gets more capacity in exchange for the application understanding that some reads are stale.
The teams that succeed with replicas are the ones who treat the consistency cost as a real engineering decision, not an implementation detail. They mark sensitive queries, they monitor lag, they test with injected delay, they have a rollback flag. The teams that struggle are the ones who treat replicas as "free read scaling" and discover the consistency cost via support tickets.
We use the patterns above on the four developer APIs we run at DocuMint, CronPing, FlagBit, and WebhookVault. The interesting thing is that we do not actually use replicas in production for any of them; we use SQLite primaries with WAL mode and the read load is comfortably handled. The patterns matter not because we have replicas today but because we will have them eventually, and the architecture should already know they are coming.