Sharding Postgres Without Tears: A Migration Path from Monolithic to Multi-Shard

The right time to shard PostgreSQL is when a single instance, vertically scaled to whatever your cloud provider's largest box happens to be, can no longer handle the write throughput. Read scaling is a different problem solved by replicas. Storage limits are usually solvable with newer instance types. Connection limits are solvable with PgBouncer. The genuine driver for sharding is sustained write throughput that exceeds what one instance can durably commit, and that threshold is much higher than most teams assume.

The teams that get sharding wrong fall into two camps. The first shard prematurely, before the single-instance approach has been seriously stressed, and pay the operational complexity tax for years before it would have been justified. The second shard reactively, in the middle of a production crisis, with no time to design the shard key carefully, and end up with a shard key that creates hot spots they have to migrate away from a year later. This piece is about avoiding both.

Choose the shard key first, last, and always

The shard key is the field used to decide which shard a row belongs to. It determines the entire access pattern of the system after sharding. Get it right and most queries hit one shard with no fan-out. Get it wrong and every query becomes a scatter-gather across all shards, which is slower than the original single-instance system was.

The ideal shard key has three properties. First, it appears in the WHERE clause of nearly every query. If a query does not include the shard key, the query has to be sent to every shard and the results combined, which is the worst-case access pattern. For most multi-tenant SaaS systems, the natural shard key is tenant_id (organization_id, account_id, customer_id) because every query is naturally scoped to one tenant. Second, the cardinality of the key is much higher than the number of shards — millions of tenants distributed across dozens of shards distribute well; a hundred enterprise tenants across ten shards distribute poorly because individual tenant size variance dominates the distribution. Third, the key's access pattern is not heavily skewed: if 90% of the traffic is for one tenant, that tenant's shard becomes a hot spot regardless of how cleverly you spread the others.

The pathological shard keys are the ones that look obvious and turn out to be wrong. Sharding by user_id in a system where users belong to organizations and queries are organization-scoped means every organization-scoped query becomes a fan-out. Sharding by created_at means recent shards are hot and old shards are cold. Sharding by hash of primary key means every join becomes cross-shard. The discipline is to enumerate the actual query patterns first and pick the shard key that lets the highest-volume queries hit one shard.

The routing layer

Sharded PostgreSQL needs a routing layer that knows the shard map — which shard holds which range or hash bucket of shard-key values — and routes each query accordingly. The implementation choices range from application-level routing (the application code knows the shard map and chooses the connection) to a dedicated proxy (Citus, Vitess for MySQL, hand-rolled PgBouncer with custom routing logic).

Application-level routing is simpler operationally but pushes shard awareness into every place that touches the database. Dedicated proxies are operationally heavier but let the application code stay shard-naive. For teams already running PgBouncer, the proxy approach is a smaller delta. For teams without infrastructure muscle to operate a dedicated proxy, application-level routing concentrated in a single data-access layer is the right starting point.

The shard map itself needs to live somewhere. The cheap-and-correct answer is a small metadata table, replicated to every application instance, that maps shard-key ranges to shard connection strings. The map changes rarely (only during shard splits or rebalancing), so cache invalidation is straightforward. The expensive-and-flexible answer is a dedicated configuration service. For most teams, the cheap answer scales further than the marketing suggests.

The dual-write phase

The migration from single-instance to sharded PostgreSQL cannot happen atomically. You will run both topologies in parallel for some period, with writes going to both, until you are confident enough in the sharded system to cut reads over and decommission the single instance.

The dual-write phase has a critical invariant: writes must reach both systems consistently. If the application writes to the old system, then writes to the new system, and the second write fails, the two systems diverge. The fix patterns are the standard ones — outbox table on the old system that a worker drains into the new system, two-phase commit if you can stomach it, idempotent writes with retries — and the choice depends on how strict the consistency requirement is during migration.

Reads during dual-write should continue to come from the old system as the source of truth, with periodic comparison runs that verify the new system has the same data. Comparison-driven validation is much more reliable than testing-driven validation because it operates on real data, real volumes, real edge cases. The comparison should produce a small set of mismatches that you can investigate; if it produces a large set, the dual-write logic has a bug and cutting over would be premature.

The cutover

The cutover from old to new is a routing change, not a data migration. By the time you cut over, both systems have been receiving writes in parallel for some period, the comparison runs are clean, and the new system is operationally ready. The cutover then is: change the read path to go to the new system, monitor closely, and either stay there or roll back if something breaks.

The rollback is the load-bearing safety net. If the cutover reveals a problem, you need to be able to switch reads back to the old system in seconds, not hours. This requires keeping the dual-write phase active even after the read cutover, until you have enough confidence to stop dual-writing. The dual-write tax during this period is real (every write costs twice the resources) but the optionality it buys is worth the cost for any non-trivial migration.

The cross-shard query problem

Some queries are inherently cross-shard. Reports, analytics, admin dashboards that need to see aggregate state across all tenants. The right pattern depends on the query's freshness requirement. For reports that can tolerate hour-old data, an analytics replica that aggregates across shards is the right answer. For real-time admin dashboards, scatter-gather queries with parallel shard execution are unavoidable but should be carefully scoped.

The wrong pattern is letting cross-shard queries leak into the user-facing path. If a user-facing endpoint requires a cross-shard query, the architecture is wrong — either the shard key is wrong (the query should have been single-shard) or the feature should not be shard-aware at all (it belongs in a separate analytics system, not the operational database).

When sharding is the wrong answer

The signs that sharding is premature are unambiguous if you look. The single instance is not actually saturated — peak CPU is under 60%, peak write IOPS is under what the disk can sustain, the largest tables fit comfortably in memory. Read replicas have not been added yet, and would absorb significant load if they were. The team has not invested in connection pooling, query optimization, or vertical scaling. Sharding is being proposed because it sounds more sophisticated than the simpler interventions that would actually move the needle.

The other sign is that the team does not know what the shard key would be. If you cannot articulate, in one sentence, which field every query will be scoped by, you are not ready to shard. The shard-key conversation has to come first, before the engineering work, because everything else depends on it.

Our use across products

The four products in this studio — DocuMint, CronPing, FlagBit, WebhookVault — run on SQLite, which is to say they run on the smallest possible database with the smallest possible operational footprint. The sharding question is years away if it ever arises. The migration path we have planned, when SQLite eventually becomes the bottleneck, is to PostgreSQL on a single instance, and only then to sharded PostgreSQL if the write volume genuinely demands it. The discipline is to defer the complexity until the simpler tool has actually been outgrown, and to use the time bought by deferral to understand the access patterns well enough to choose the shard key correctly when the moment comes.

The summary

Sharding PostgreSQL is the correct answer to a specific problem — sustained write throughput that exceeds what one instance can handle — and a wrong answer to almost every other problem that gets proposed for it. The hard part is not the sharding mechanics; it is choosing the shard key correctly, designing the dual-write migration so the cutover is reversible, and resisting the temptation to shard before the simpler interventions have been exhausted. Teams that get this right end up with a sharded system that costs operationally what the unsharded system did, just at higher throughput. Teams that get this wrong end up with a sharded system that is more complex, more fragile, and not actually faster.