Most tutorials show you pg_basebackup -h localhost -U postgres -D /backup and stop there. That command works. It also silently makes choices you should understand.
What pg_basebackup Actually Does
pg_basebackup connects to your running Postgres server and streams a copy of the entire cluster data directory over the replication protocol. The server keeps running. Your application keeps writing. pg_basebackup ensures consistency through a checkpoint and continuous WAL streaming — not by locking anything.
The result is a base backup: a snapshot of your data files plus enough WAL to bring them to a consistent state.
--wal-method: stream vs fetch vs none
This flag determines what happens to WAL during the backup:
- stream (default since PG10): opens a second connection and streams WAL in parallel with the data copy. The backup is self-contained. This is what you want.
- fetch: copies WAL files after the data copy completes. Works only if your
wal_keep_sizeis large enough that Postgres hasn't recycled the WAL files you need. On a busy server, this is a race you can lose silently. - none: no WAL included. The backup cannot restore without an external WAL archive. Only use this if you have a continuous WAL archiving setup and you know what you're doing.
Default is stream. Leave it there unless you have a reason not to.
--checkpoint: fast vs spread
Before pg_basebackup starts copying data files, it needs a consistent starting checkpoint. The --checkpoint flag controls how that checkpoint happens:
- fast: requests an immediate checkpoint. Postgres flushes all dirty buffers to disk as quickly as possible. This causes a brief I/O spike. The backup starts sooner, but you pay for it up front.
- spread (default): waits for the next scheduled checkpoint. Lower I/O pressure, but the backup doesn't start until that checkpoint completes — which could be several minutes on a lightly loaded server.
For scheduled nightly backups: spread is fine. For an ad-hoc backup before a risky migration: use --checkpoint=fast so you're not waiting.
Compression
On large databases, uncompressed backups waste disk space and transfer time. Use:
pg_basebackup -D /backup --format=tar --compress=gzip:6The :6 is the gzip compression level (1–9). Level 6 is a reasonable tradeoff between speed and size. Levels 7–9 compress slightly more but take noticeably longer with minimal real-world savings past level 6.
Since Postgres 15, --compress also accepts lz4 and zstd. LZ4 is faster with slightly worse ratio. Zstd matches gzip ratios at higher speed. Worth benchmarking on your specific data if backup speed matters.
Parallel Tablespace Streaming
If you have multiple tablespaces on different disks, the --jobs flag controls how many are streamed in parallel (default: 1). For a standard single-tablespace cluster, this has no effect. For clusters with several tablespaces on separate disks, parallelism can meaningfully reduce backup time.
Backup Manifests and pg_verifybackup
Since Postgres 13, pg_basebackup produces a backup_manifest file alongside the backup. This manifest contains checksums for every file.
Use it:
pg_verifybackup /backupThis checks the manifest against the actual backup files. If anything was corrupted or truncated during transfer, pg_verifybackup tells you now — not at 3am when you try to restore.
Run pg_verifybackup as part of your backup script. Store the exit code. An unverified backup is not a backup.
Retention and WAL Archiving
pg_basebackup alone gives you a point-in-time snapshot — the moment the backup completed. To restore to any arbitrary earlier point, you need continuous WAL archiving (archive_mode=on, archive_command configured) running alongside your backup rotation. The base backup is the floor; the WAL archive fills the space between backups.
Without WAL archiving, a base backup restores to its own end timestamp and nothing earlier. That's still useful. It's just not PITR.
What pg_basebackup Does Not Give You
- Incremental backups before Postgres 17. Every pg_basebackup run copies the entire cluster. On a 2TB database, every backup is 2TB. Incremental backup support arrived in Postgres 17 via
pg_basebackup --incremental. On earlier versions, tools like pgBackRest or Barman handle this at the application layer. - Logical backup. pg_basebackup produces a physical backup — a copy of the binary files. You cannot restore individual tables from it or move data to a different major Postgres version. For selective restore or cross-version migration, use
pg_dump. - Automatic restore testing. The backup works until you discover it doesn't. Schedule test restores. pg_basebackup doesn't automate this for you.
A Minimal Working Script
pg_basebackup \
-D /backup/$(date +%Y%m%d) \
--format=tar \
--compress=gzip:6 \
--wal-method=stream \
--checkpoint=fast \
--progress \
&& pg_verifybackup /backup/$(date +%Y%m%d) \
&& echo "Backup verified: $(date)"Add a retention policy. Delete backups older than N days. Test a restore before you need one.
Building something worth backing up? builds.anethoth.com tracks software projects in progress — public build dossiers, real milestones.