SQLite WAL Mode Checkpointing: When Passive Isn't Enough

WAL mode gives SQLite concurrent reads without locking writers—but the checkpoint that flushes WAL frames back to the database file is where things quietly go wrong. The default checkpoint mode is PASSIVE, which means it only writes frames that no reader is currently accessing. Under sustained read load, passive checkpoints can stall indefinitely, leaving your WAL file growing without bound.

SQLite has four checkpoint modes:

PASSIVE — Writes as many frames as possible without waiting. Skips frames held by active readers. Returns immediately. Default for auto-checkpoint.
FULL — Waits until all readers have released their read locks before checkpointing. Blocks new readers from starting during the checkpoint. Does not reset the WAL.
RESTART — Like FULL, but also waits until all readers have left the WAL entirely, then resets the WAL write position to frame 0. Next write starts at the beginning of the WAL file.
TRUNCATE — Like RESTART, but also truncates the WAL file to zero bytes. Guarantees a clean slate, costs a filesystem operation.

The practical difference between FULL and RESTART matters if you want to bound WAL file size. FULL completes the checkpoint but doesn't reset the write position—your WAL file stays at whatever size it grew to, and future writes append from the high-water mark. RESTART resets the write position, so the next transaction overwrites from frame 1. TRUNCATE goes further and shrinks the file on disk.

When Auto-Checkpoint Stalls

SQLite triggers an automatic PASSIVE checkpoint every 1000 frames by default (PRAGMA wal_autocheckpoint). In a read-heavy workload—a reporting query holding a read lock for 30 seconds, a long-running analytics transaction, a connection pool with idle connections that opened transactions and forgot to close them—passive checkpoints can fail to make progress entirely. The WAL file grows. Reads stay fast (they're reading from the WAL). Writes work fine. Nothing breaks. The WAL just keeps growing.

You notice this when your SQLite data directory hits unexpected disk pressure. Run PRAGMA wal_checkpoint(FULL) and watch the output:

sqlite> PRAGMA wal_checkpoint(FULL);
-- Returns: (busy, log, checkpointed)
-- busy=1 means a reader was blocking the checkpoint

If busy is non-zero, a reader held a lock during the checkpoint window. PASSIVE and FULL both surface this. If log equals checkpointed, the checkpoint completed. If not, you have outstanding frames.

The Busy Handler Interaction

Checkpoint modes that wait (FULL, RESTART, TRUNCATE) respect SQLite's busy handler. If you've set a busy timeout with sqlite3_busy_timeout() or PRAGMA busy_timeout, the checkpoint will retry within that window before giving up. Without a busy handler, FULL returns immediately if a reader is blocking—it doesn't actually wait. This surprises people. "I ran PRAGMA wal_checkpoint(FULL) and it still didn't complete." Right—because you didn't set a busy timeout.

Set one:

PRAGMA busy_timeout = 5000; -- 5 seconds
PRAGMA wal_checkpoint(RESTART);

Now the checkpoint waits up to 5 seconds for readers to clear before giving up.

Tuning wal_autocheckpoint

PRAGMA wal_autocheckpoint = N controls how many WAL frames trigger an automatic checkpoint. The default is 1000 frames (roughly 4MB with 4096-byte pages). For indie SaaS applications where writes are infrequent and you care about disk hygiene, dropping this to 100–200 frames is reasonable. For write-heavy workloads where checkpoint latency hurts, raise it or disable it (wal_autocheckpoint = 0) and checkpoint manually on a schedule.

Disabling auto-checkpoint entirely is valid if you have a background thread or cron job calling RESTART or TRUNCATE on a schedule. It gives you predictable latency on writes at the cost of owning the checkpoint lifecycle.

Practical Guidance for Indie SaaS on SQLite

The setup that works without surprises: WAL mode on, PRAGMA busy_timeout = 5000, PRAGMA wal_autocheckpoint = 200, and a nightly TRUNCATE checkpoint from a maintenance task. This keeps WAL file size bounded, gives reads the concurrency they need, and avoids silent WAL growth in the background.

If you're on Litestream, the WAL frames are being replicated before checkpointing—checkpoint too aggressively and you'll lose the replication window. Litestream recommends disabling auto-checkpoint and letting it control the checkpoint trigger. Read its docs before tuning.

The mistake to avoid: assuming WAL mode fixes all locking problems and never thinking about checkpoints again. WAL solves reader-writer contention. It doesn't solve WAL file growth, and it doesn't checkpoint itself correctly under concurrent readers without help.

More from Anethoth: anethoth.com · builds.anethoth.com