Postgres Logical Replication: Patterns Beyond Streaming Replication

Streaming replication is the default in Postgres and the only kind most teams ever set up. The primary streams the WAL — the binary write-ahead log — to one or more replicas, which apply it block-for-block. The result is a byte-identical replica that lags the primary by network round trip plus apply latency. It is a good default and the right answer when you want a full hot standby for failover.

Logical replication is the other Postgres replication mechanism, available since Postgres 10 in 2017. Instead of streaming WAL bytes, the primary decodes the WAL into a logical change stream — INSERT, UPDATE, DELETE on specific tables — and the replica applies those statements via its normal SQL execution path. This sounds like a small difference but enables a different category of replication topologies that streaming cannot do.

This post covers when logical replication is the right tool, the operational patterns that earn their complexity, the constraints that bite teams who haven't read the documentation, and the cases where streaming replication is still the better answer.

What logical replication enables that streaming does not

Four capabilities matter. First, partial replication: you can replicate a subset of tables rather than the whole database, which is the right tool for sharing operational tables with an analytics warehouse or replicating reference data to multiple regional databases. Second, cross-version replication: a Postgres 14 primary can logically replicate to a Postgres 16 replica, which is the supported mechanism for major-version upgrades without downtime. Third, write-active replicas: a logical replica is a normal Postgres database that accepts writes to non-replicated tables, which lets you have replicas that augment the replicated data with their own local state. Fourth, multi-source replication: a single replica can subscribe to multiple primary databases via separate publications, which enables consolidation patterns that streaming replication cannot express.

The publication-subscription model

Logical replication uses publish-subscribe semantics. On the primary you create a PUBLICATION with CREATE PUBLICATION my_pub FOR TABLE users, orders, products; on the replica you create a SUBSCRIPTION with CREATE SUBSCRIPTION my_sub CONNECTION '...' PUBLICATION my_pub. The subscription pulls changes from the publication continuously and applies them locally. You can publish entire schemas with FOR ALL TABLES IN SCHEMA, or filter tables with a row-level WHERE clause (added in Postgres 15).

The replication slot on the primary is the durability mechanism — it holds WAL until the subscription has confirmed receipt, which prevents data loss if the replica is offline temporarily but creates a disk-fill risk if the replica is offline for a long time. The slot grows the WAL on the primary at a rate proportional to write throughput, and a forgotten slot from a dead replica is one of the canonical Postgres operational disasters.

The constraints that bite

Logical replication has constraints that streaming does not. The replicated tables must have a primary key or REPLICA IDENTITY FULL set, because logical replication needs row identity to apply UPDATE and DELETE. Tables without primary keys can be added to a publication only with REPLICA IDENTITY FULL, which logs the entire row image on every change and is correspondingly expensive.

DDL is not replicated. CREATE TABLE, ALTER TABLE, CREATE INDEX, and other schema changes on the primary do not propagate to the replica. The schema-evolution discipline is therefore: apply DDL on the replica first (where it cannot conflict with replication), then on the primary. This is the inverse of how most teams think about schema migrations and produces footguns when teams forget the constraint.

Sequences are replicated only as their last_value, not as the source of truth. After a logical-replica failover the sequence values may be behind the actual primary keys in the data, leading to "duplicate key" errors on the next INSERT. The pattern for cutover is to advance sequences on the new primary by a safety margin before promoting it.

Large objects (the LO type, separate from BYTEA) are not replicated. TRUNCATE replicates by default in Postgres 11+, but this can be disabled per publication. Tables with foreign keys are replicated independently per table, so the foreign-key constraint must already exist on the replica before the replicated rows can be applied — a schema mismatch causes the subscription to error and pause.

Pattern: zero-downtime major version upgrade

The classic use case for logical replication is upgrading Postgres major versions without downtime. The procedure: stand up a new server running the target version, dump the schema from the old primary and load it into the new server, create a publication on the old primary FOR ALL TABLES, create a subscription on the new server to copy initial data and stream ongoing changes, wait for the subscription to catch up to within a few seconds of the primary, then cut over by repointing the application to the new server.

The cutover step is brief — typically a few seconds during which writes are paused, the subscription is allowed to drain, sequences are advanced, and the application connection string changes. The rollback path is to reverse the publication direction temporarily so the old server can receive writes from the new server, although this is rarely needed in practice.

The same pattern works for migrating between cloud providers (AWS RDS to self-hosted, or vice versa), upgrading the underlying hardware, or splitting a monolithic database into multiple smaller databases.

Pattern: partial replication for analytics

The pattern for sharing operational data with a warehouse is to create a publication for the tables that the warehouse needs and a subscription on a separate Postgres instance that the warehouse reads from. The warehouse instance can have additional indexes optimized for analytical queries, additional materialized views derived from the replicated data, and additional local tables that do not exist on the primary.

This is much cheaper than ETL pipelines for the up-to-the-second case, and the analytics database is a normal Postgres instance that warehouse tools (Metabase, Superset, dbt) can connect to directly. The constraint is that the analytics database is read-only with respect to the replicated tables — writes to those tables would conflict with incoming replication.

Pattern: regional replicas for read latency

For applications that need low read latency from multiple geographic regions, logical replication can populate per-region read replicas. The primary stays in one region; each regional replica subscribes to the publication and serves local reads. Writes route to the primary, with WAN-latency cost, but reads stay local.

The trade-off versus streaming replication is that you can replicate a subset of tables (saving disk on the regional replica) and the regional replica can have local-only tables for caching, analytics, or per-region session data. The downside is the constraints listed above, particularly the schema-evolution discipline, which is harder to manage across multiple regional replicas than across a single hot standby.

When streaming is still the right answer

Logical replication is more flexible than streaming and correspondingly more complex. For the canonical use case of "I want a hot standby for failover," streaming replication is simpler, cheaper, and more battle-tested. The schema-mismatch failure modes do not exist; DDL replicates automatically; sequences are byte-identical; large objects work; and the apply latency is lower because the replica does not parse SQL.

The decision tree is: if you want a full byte-for-byte copy of the database for failover, use streaming. If you want a partial copy, a cross-version copy, a write-active copy, or a multi-source copy, use logical. The two are not mutually exclusive — production systems often use streaming replication for hot standby and logical replication for analytics or migration. The configuration overhead of running both is small once you understand the model.

The operational signals to monitor

The four metrics that catch most logical-replication problems: replication lag in bytes (pg_stat_replication.replay_lag for streaming, pg_stat_subscription on the replica for logical), slot WAL size (pg_replication_slots.pg_wal_lsn_diff against current LSN), subscription apply errors (logged to the replica's standard error and to pg_subscription_rel for table-level state), and conflict count (pg_stat_database_conflicts on the replica for streaming, application-level for logical). Alerting on slot WAL size is the most important — a forgotten slot can fill the primary's disk in hours under heavy write load, and the resulting outage is significantly worse than the original problem the replication was trying to solve.

Read more