Postgres pg_stat_progress_analyze: Watching ANALYZE Work in Real Time

ANALYZE has been running in the background of your Postgres cluster for as long as you've had one. It collects statistics that the query planner uses to make decisions. When those statistics are stale, you get bad plans. Bad plans cause bad queries. Bad queries cause incidents. You've probably debugged this chain before without realizing what started it.

Since Postgres 14, you can watch ANALYZE work in real time via pg_stat_progress_analyze. Almost nobody knows it exists.

The view columns

The view has nine columns worth knowing:

pid — process ID of the ANALYZE worker
relid — OID of the table being analyzed (join to pg_class for the name)
phase — current phase of the operation (see below)
sample_blks_total — number of heap blocks that will be sampled
sample_blks_scanned — number of heap blocks scanned so far
ext_stats_total — total extended statistics objects on this table
ext_stats_computed — extended statistics computed so far
child_tables_total — for partitioned tables, total child tables
child_tables_done — child tables analyzed so far

The progress estimate is straightforward: sample_blks_scanned / sample_blks_total gives you fraction complete. Multiply by the elapsed wall time to estimate ETA.

SELECT
  p.pid,
  c.relname AS table,
  p.phase,
  p.sample_blks_scanned,
  p.sample_blks_total,
  ROUND(100.0 * p.sample_blks_scanned / NULLIF(p.sample_blks_total, 0), 1) AS pct_done
FROM pg_stat_progress_analyze p
JOIN pg_class c ON c.oid = p.relid;

The five phases

ANALYZE moves through five phases:

initializing — setting up internal state, acquiring locks
acquiring sample rows — sampling heap pages; this is where the blocks counters increment
computing statistics — processing the sampled rows into histogram and MCV data
computing extended statistics — if you have any CREATE STATISTICS objects on the table, they're computed here
finalizing analyze — writing statistics to pg_statistic and updating pg_class

Most of the time is in phase 2. If you see it stuck in "acquiring sample rows" for a long time on a large table, you're looking at I/O pressure, not a hang.

Why ANALYZE matters for planner accuracy

The query planner estimates row counts before executing any query. It uses column statistics stored in pg_statistic — histograms, most-common values, null fractions, correlation coefficients. When these statistics are stale, the planner makes bad estimates. Bad estimates lead to wrong join strategies, wrong index choices, wrong memory grants for hash joins.

The classic failure mode: you bulk-load 10 million rows into a table. The table previously had 50,000 rows. The planner still thinks it has 50,000 rows. Every query that touches that table gets a plan optimized for a small table. Sequential scans replace index scans. Nested loop joins replace hash joins. Performance degrades by orders of magnitude.

ANALYZE fixes this. Autovacuum handles ANALYZE automatically in steady state. The problem is steady state is rarely the dangerous condition.

When manual ANALYZE is warranted

Four cases where you should trigger ANALYZE explicitly rather than waiting for autovacuum:

Post-bulk-load. After inserting a large volume of rows, statistics are immediately stale. Autovacuum has a threshold and a delay before it fires. Run ANALYZE table_name immediately after the load completes.

Post-migration backfill. Schema migrations that backfill data create the same stale-statistics problem. Any migration that touches more than ~10% of rows needs a manual ANALYZE at the end.

After significant deletes. VACUUM handles dead tuples; ANALYZE handles statistics. They're triggered separately. A table that was 90% deleted and repopulated has bad statistics until ANALYZE runs. Autovacuum usually handles this, but not always on schedule.

After schema changes. Adding a column with NOT NULL DEFAULT in Postgres 11+ is instant (metadata-only), but the new column has no statistics. ANALYZE the table before queries start using the new column in WHERE clauses.

The autovacuum threshold formula

Autovacuum triggers ANALYZE when:

n_mod_since_analyze >= autovacuum_analyze_threshold + autovacuum_analyze_scale_factor * n_live_tup

The defaults are autovacuum_analyze_threshold = 50 and autovacuum_analyze_scale_factor = 0.2. For a table with 1 million rows, autovacuum won't trigger ANALYZE until 200,050 rows have been modified. If you insert 200,000 rows at once, it fires quickly. If you insert 1,000 rows a day, it takes 200 days.

Watch pg_stat_user_tables for tables with high n_mod_since_analyze and old last_analyze timestamps.

SELECT relname, n_live_tup, n_mod_since_analyze, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_mod_since_analyze > 0.1 * n_live_tup
ORDER BY n_mod_since_analyze DESC
LIMIT 20;

What the view does not show

A few limitations worth knowing before you rely on this view:

pg_stat_progress_analyze shows per-table progress, but it does not show per-column statistics quality. You cannot tell from this view whether a specific column's histogram is accurate or how many MCV values are tracked. For that, query pg_statistic or pg_stats directly.

The view does not show correlation drift. A column's physical correlation with heap order can change as rows are inserted and updated, affecting index scan cost estimates. This is invisible to the progress view.

Extended statistics effectiveness is not shown. If you've created a CREATE STATISTICS object for multi-column correlation, the progress view shows that it ran, not whether it produced useful data.

Comparison with sibling views

Postgres has three progress views for maintenance operations: pg_stat_progress_analyze, pg_stat_progress_vacuum, and pg_stat_progress_create_index. All three follow the same pattern: pid, relid, phase, counters for work done vs. total.

VACUUM is typically slower than ANALYZE on the same table because it has to process dead tuples in addition to sampling. CREATE INDEX is often the slowest of the three on large tables, especially with CONCURRENTLY. ANALYZE is usually the fastest maintenance operation on a per-table basis.

If you're watching a maintenance window that includes all three operations, check all three views. An ANALYZE completing quickly while VACUUM is still running on the same table is normal. An ANALYZE stuck in "acquiring sample rows" longer than you'd expect usually means I/O contention with VACUUM on the same or adjacent tables.

The practical habit: after any bulk data operation, check that ANALYZE completes before putting the table back into production query paths. pg_stat_progress_analyze makes this observable.

Building in public at builds.anethoth.com.