engineering

Postgres Logical Decoding: Streaming Change Data Capture Without a Separate Service

Logical decoding turns Postgres into a streaming source of row-level changes. Used well it eliminates a service. Used poorly it produces phantom slot-fill outages.

Anethoth

16 May 2026 — 4 min read

Most teams that want change data capture reach for Debezium or a managed CDC service before checking what Postgres can do natively. Logical decoding, available since Postgres 9.4 in 2014 and substantially improved through Postgres 16, exposes a stream of row-level changes as they commit. With the right output plugin and a small consumer process, you can have CDC without a separate service. With the wrong configuration, you can have a slot that silently grows until the disk fills.

We use Postgres-style reasoning about replication slots across all four products even though we run SQLite today: DocuMint for invoice-event replication, CronPing for monitor-status streams, FlagBit for flag-change broadcasts, and WebhookVault for delivery streaming. When teams ask about graduating to Postgres, logical decoding is one of the underappreciated features that genuinely shifts the architecture.

What logical decoding actually does

The mechanism builds on the write-ahead log that Postgres already maintains for crash recovery and physical replication. A logical replication slot tells Postgres to retain WAL beyond the normal recycle horizon and to decode it through an output plugin into row-level change events. The plugin receives INSERT, UPDATE, DELETE, and TRUNCATE operations along with the before-and-after column values, optionally filtered by publication.

The two standard output plugins are pgoutput (used by Postgres logical replication and Debezium since version 1.6) and wal2json (a JSON emitter useful for ad-hoc consumers). pgoutput is preferable when the consumer can parse the binary protocol because it is more efficient and better maintained. wal2json is preferable when you need a quick JSON-over-WebSocket bridge to a downstream service.

The slot is the load-bearing primitive

A logical replication slot has two critical properties. First, it pins the WAL position at the consumer's last acknowledged LSN. Postgres will not recycle WAL beyond that position until the slot advances or is dropped. Second, it survives Postgres restarts. The combination means a slow or stopped consumer translates directly into accumulated WAL on disk, and a forgotten slot from a decommissioned consumer will silently grow until the disk fills.

The Postgres documentation has been warning about this for ten years and the warning still surprises teams in production. Monitoring is non-negotiable: pg_replication_slots exposes the active state and the lag in bytes, and an alert on either restart_lsn falling behind by more than a few GB or active=false on an expected slot will catch the failure mode before it becomes an outage.

The consumer pattern

The consumer is a small long-lived process that connects to Postgres on the replication connection (a different protocol than normal SQL queries), creates or attaches to a named slot, and streams events from the slot. The events arrive in commit order with transaction boundaries marked. The consumer's job is to process each event idempotently and acknowledge the LSN it has processed; Postgres uses that acknowledgment to advance the slot's retain horizon.

Three operational properties matter for the consumer. It must be idempotent because reconnection after a crash will replay events from the last acknowledged LSN. It must acknowledge frequently enough to prevent slot bloat, but not so frequently that it becomes a bottleneck (every few seconds is typical). It must handle the at-least-once delivery semantics rather than assuming exactly-once, because the acknowledgment-then-crash window is real.

Where logical decoding shines

Four use cases reliably justify the operational complexity. First, dual-writes to a search index or analytics warehouse without application-layer coupling: the database is canonical and the downstream system is a derived projection that can be rebuilt from the slot. Second, materialized views in another service that need real-time updates: a customer dashboard pulled from a denormalized cache rather than the operational tables. Third, webhook fan-out where the database insert is the canonical event source: the event is generated automatically rather than by application-layer publish-after-commit code that can drop on crash. Fourth, cross-version Postgres upgrades via logical replication, which is the standard zero-downtime upgrade path for major versions.

The transactional outbox pattern, often presented as an alternative to CDC, is structurally similar but moves the slot-management problem into the application. Logical decoding is the right answer when the outbox would just be the entire database; outbox is the right answer when you want to filter to a small set of explicitly-published events.

What logical decoding does not replace

Three limits matter. First, the slot does not capture DDL changes by default; schema migrations need separate coordination. Second, large transactions (millions of rows in one COMMIT) produce correspondingly large decoding bursts that can starve the consumer; the Postgres 14+ streaming-of-in-progress-transactions feature partially fixes this but requires plugin support. Third, the slot is per-database in older Postgres versions; cross-database replication needs multiple slots or a different approach.

The consumer also cannot easily ask retrospective questions about historical state. The slot replays from a position, not a time, and old WAL is eventually recycled. For point-in-time historical queries, you still want a separate audit log or event store.

Configuration knobs that matter

wal_level must be set to logical (default is replica), which requires a restart and adds modest WAL overhead even when no slot is active. max_replication_slots and max_wal_senders need to be sized for the number of concurrent slots. max_slot_wal_keep_size (Postgres 13+) is the safety net that caps slot-driven WAL retention at a configured size and starts dropping slots before the disk fills; this is the single most important knob to set in production. logical_decoding_work_mem (Postgres 13+) governs how much memory the decoder uses before spilling to disk during large transactions.

The replication user needs the REPLICATION attribute and CREATE permission on the database. Network access from the consumer needs explicit pg_hba.conf entries for replication connections, which are separate from normal SQL connections.

Operational signals to monitor

Five metrics catch most failure modes. Slot lag in bytes from pg_replication_slots (pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) is the primary alert; threshold depends on disk headroom but a few GB is typical. Slot active status: an inactive slot for more than a few minutes indicates a stopped consumer. WAL generation rate in bytes per second: an order-of-magnitude jump correlates with bulk operations that may overwhelm consumers. Consumer-side processing latency: the gap between event timestamp and processing time, alerted when over the SLO. Failed-event count: idempotent retries are fine, but persistent failures indicate a schema-incompatibility bug worth surfacing.

The deeper observation

Logical decoding is a feature that exists because Postgres treats the WAL as a first-class addressable resource and exposes the same machinery for replication that it uses internally for crash recovery. The architectural decision to make the WAL the canonical record of changes (rather than an internal implementation detail) is what makes CDC possible without bolting on an additional system. The pattern recurs across the database engineering decisions that have aged well: the choice to expose internals as composable primitives turns out, decades later, to enable use cases the original engineers did not anticipate. The cost is operational complexity around the new primitive (slot management, monitoring, consumer reliability) that teams underestimate at first and learn to respect after the first slot-fill incident.

Postgres Logical Decoding: Streaming Change Data Capture Without a Separate Service

Anethoth

What logical decoding actually does

The slot is the load-bearing primitive

The consumer pattern

Where logical decoding shines

What logical decoding does not replace

Configuration knobs that matter

Operational signals to monitor

The deeper observation

Read more

The Forgotten History of the Bicycle Wheel: How Wire Spokes Made the Modern Wheel Possible

How Wandering Albatrosses Sleep While Flying: The Strange Neural Engineering of Unihemispheric Slow-Wave Sleep

Designing API Webhook Deactivation: When and How to Stop Calling Endpoints That Persistently Fail

Postgres pg_stat_statements_info: Tracking the Statistics Collector's Own Health