TopicPostgres logical replication internalsDepthDeep dive — internals and operational tradeoffsWhen to readBefore setting up cross-cluster table sync or CDC pipelines
Physical replication copies every WAL byte from primary to replica. The replica becomes a byte-for-byte mirror. That's useful, but it's not selective. You get everything or nothing.
Logical replication is different. It decodes WAL into row-level operations — INSERT, UPDATE, DELETE — and replays them on a subscriber. You control which tables are included. The subscriber can have different indexes, different column sets, even run different Postgres versions.
The Publisher/Subscriber Model
The publisher exposes a publication — a named set of tables. The subscriber connects to that publication and creates a subscription. Postgres handles the rest.
-- On the publisher
CREATE PUBLICATION prod_events FOR TABLE events, audit_log;
-- FOR ALL TABLES is tempting but dangerous
-- Every new table you create gets replicated automatically
-- Usually not what you want
CREATE PUBLICATION everything FOR ALL TABLES;The subscriber side:
-- On the subscriber
CREATE SUBSCRIPTION analytics_sub
CONNECTION 'host=prod-db port=5432 dbname=myapp user=replicator password=...'
PUBLICATION prod_events;
-- Check subscription status
SELECT * FROM pg_stat_subscription;When the subscription is created, Postgres copies the initial data from the publication tables to the subscriber (a sync phase), then switches to streaming new changes. The initial copy uses COPY internally — it's fast, but it holds a lock on the publication side long enough to establish a consistent snapshot.
Replication Slots and WAL Retention
Logical replication uses a replication slot on the publisher. The slot tracks how far the subscriber has consumed the WAL stream. As long as the slot exists, Postgres keeps WAL segments the subscriber hasn't processed yet.
-- Monitor slot lag
SELECT slot_name, restart_lsn, confirmed_flush_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes
FROM pg_replication_slots
WHERE slot_type = 'logical';The problem: if your subscriber falls behind or disconnects, the slot keeps growing. Postgres can't remove the WAL the slot needs. On a busy publisher, a stale logical replication slot is a disk-filling time bomb.
Monitor pg_replication_slots. Alert on lag_bytes > 1GB. Drop slots that belong to dead subscribers — the WAL will accumulate indefinitely otherwise.
The DDL Gotcha
Logical replication does not replicate DDL. At all.
If you run ALTER TABLE events ADD COLUMN source TEXT on the publisher, nothing happens on the subscriber. The subscriber's table still has the old schema. When Postgres tries to replay an INSERT that includes the new column, the subscriber worker crashes with a schema mismatch error.
The recovery path:
- Apply the DDL on the subscriber manually, before or immediately after the publisher
- If the worker crashed, restart it:
ALTER SUBSCRIPTION sub_name ENABLE - If you're using
FOR ALL TABLES, add the new table to the subscriber before the data arrives
Teams discover this the first time they do a schema migration on a system with logical replication active. It's not subtle — the worker stops immediately and logs a clear error. But if nobody is watching the subscriber, the lag grows silently until someone notices the data is weeks stale.
Monitoring
-- On the publisher: replication slot health
SELECT slot_name, active, restart_lsn, confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)) AS lag
FROM pg_replication_slots
WHERE slot_type = 'logical';
-- On the subscriber: subscription status
SELECT subname, pid, received_lsn, last_msg_send_time,
latest_end_lsn, latest_end_time
FROM pg_stat_subscription;Conflict Resolution (or Lack Thereof)
When the subscriber tries to apply a change and hits a conflict — a duplicate key on INSERT, a missing row on UPDATE — the worker crashes. Postgres logs the error and stops the subscription worker.
You have three options:
- Skip the conflicting transaction:
SELECT pg_replication_origin_advance('pg_N', '<LSN>')— use the LSN from the subscription error log - Fix the subscriber data manually to remove the conflict, then restart the worker
- Disable the subscription, drop the table, and re-sync — the nuclear option
The implication: logical replication assumes the subscriber is mostly read-only. If you're writing to tables on both sides, conflicts are inevitable and manual resolution is painful. Logical replication is not bidirectional sync.
When Logical Replication Is Wrong
You need DDL sync: Use pg_dump | psql for full schema migrations. Or look at tools like pglogical which add DDL replication (though with caveats).
You need real-time CDC for analytics: Logical replication works, but Debezium with the pgoutput plugin (which is what logical replication uses internally) gives you more control over consumer behavior, offset management, and schema registry integration. Debezium handles reconnects, slot lifecycle, and Kafka integration. Rolling your own on top of pg_recvlogical is rarely worth it.
You need a true standby: Use streaming physical replication. Logical replication leaves gaps for DDL, sequences, large objects, and TRUNCATE (configurable but disabled by default).
What Logical Replication Actually Is
The useful mental model: logical replication is not "copying your data to another server." It's "routing a selected stream of row-level changes to a subscriber that interprets and applies them." The subscriber is not a replica. It's a consumer of a change stream that happens to materialize as a table.
That framing makes the constraints obvious. DDL isn't part of the row-level change stream. Conflicts happen when the consumer's local state diverges from what the publisher assumes. The slot is a cursor in the WAL, not a backup.
Building something? Share your progress on builds.anethoth.com — public build dossiers for software projects in progress. Free to list.