Postgres pg_stat_statements_reset: When and Why to Throw Away Query Statistics
pg_stat_statements_reset() looks like a clean slate function but throwing away cumulative query statistics destroys trend visibility. The right discipline is rare, scoped resets paired with snapshot capture.
The pg_stat_statements view in Postgres accumulates query statistics from the moment the extension is loaded or last reset. The total_exec_time, calls, rows, and buffer counters all monotonically increase. The view's value as a workload diagnostic comes from this accumulation — it lets you see what queries have actually cost you in aggregate over time, not what they happened to be doing in the last five minutes.
The reset functions — pg_stat_statements_reset() globally, or with arguments to scope to a specific query, user, or database — wipe the counters and start again. Used carelessly, they destroy the diagnostic record that makes the view useful. Used carefully, they isolate the effect of a specific change so you can measure its impact without baseline noise.
Why reset is destructive by default
The cumulative counters make rate calculations possible only when you can sample. If you snapshot pg_stat_statements every hour and store the snapshots, you can compute calls-per-hour and exec-time-per-hour from successive snapshots by subtracting. The actual rate over the last hour is the increase in the cumulative counter divided by the interval. This works because the counter only grows.
If you reset at hour boundaries instead, you get the same rate calculation for the current interval, but you lose any longer-window analysis. The query that dominated yesterday's workload is gone. The hourly graph of top-ten-by-total-time becomes a series of independent hourly snapshots with no continuity across them. Comparison against a baseline week — the standard tool for catching regressions after a deploy — becomes impossible.
The pattern that compounds the destruction is the diagnostic reset during incidents. Someone notices a slow query, runs pg_stat_statements_reset() to get a clean view of current load, and now nobody can see what the workload looked like before the incident started. The historical context that would have explained the regression is wiped.
When reset is the right call
The right uses are narrow and deliberate. Post-schema-change reset is one — you've added an index, dropped a column, or restructured a query, and you want to measure the effect on a specific query without the historical pre-change statistics blurring the new average. The discipline is to capture a snapshot first, do the reset, and let the new statistics accumulate over a representative window before drawing conclusions.
Post-bug-fix reset is another. You shipped a fix for a query that was producing pathological plans, and the historical bad-plan statistics will dominate the mean execution time for weeks otherwise. Resetting that specific queryid via pg_stat_statements_reset(userid, dbid, queryid) lets the fixed query's statistics accumulate from zero without affecting any other query's history.
The deliberate diagnostic window is the third — you want to measure exactly what a specific workload costs, isolated from background queries. Reset, run the workload, capture the snapshot, reset again. The pattern is more common in benchmarking than in production but it has legitimate operational uses when you need to attribute cost to a specific operation.
The capture-before-reset discipline
Every reset should be preceded by a snapshot. The simplest implementation is a snapshots table that captures the entire pg_stat_statements view at a moment in time. INSERT INTO pg_stat_statements_snapshots SELECT now() AS snapped_at, * FROM pg_stat_statements. The snapshots table grows but it grows slowly — even a tens-of-thousands-of-distinct-queries workload produces megabytes per snapshot, not gigabytes.
The snapshots table makes resets reversible in the sense that matters. You can still compute pre-reset cumulative statistics by querying the most recent snapshot. You can still do baseline comparison by querying a snapshot from before the incident. You can still graph long-term trends by computing differences between successive snapshots regardless of how many resets happened between them.
The snapshot table also gives you a path to longer-term retention than the live view supports. The live view is capped at pg_stat_statements.max entries — typically five thousand — and evicts long-tail queries when the cap is reached. The snapshot table can be retained indefinitely and queried for historical analysis that the live view cannot support.
The scoped reset path
The pg_stat_statements_reset(userid, dbid, queryid) variant lets you reset statistics for a specific query, leaving every other query's history intact. The use case is the post-bug-fix scenario — you want the historical bad-plan statistics for one query to go away without affecting any other query.
The mechanics are slightly fiddly. You have to know the queryid of the query you want to reset, which means you have to identify it in the view first. SELECT queryid, query FROM pg_stat_statements WHERE query LIKE '%my_problematic_pattern%' gets you the queryid, then pg_stat_statements_reset(0, 0, queryid) with zero values for userid and dbid wildcards across all users and all databases. The zero-wildcard semantics are documented but easy to misremember.
The other scoped variant is per-user reset, which is occasionally useful when a specific application or background job has accumulated misleading historical statistics. The application case is rarer than the per-queryid case in practice.
What does not get reset
pg_stat_statements_reset() resets the view's own counters but not other related diagnostics. pg_stat_database, pg_stat_user_tables, pg_stat_user_indexes, and pg_stat_io are all separate cumulative views with their own reset functions — pg_stat_reset() for most of the per-database statistics, pg_stat_reset_shared() for shared statistics like the WAL counters. A team that resets pg_stat_statements and assumes other statistics also reset will be confused when the rate calculations from those views disagree.
The auto_explain log records also persist across resets. If you have auto_explain configured with log_min_duration_statement, the slow-query records in the Postgres log remain after pg_stat_statements_reset() and can substitute for the lost cumulative view as a historical record of individual slow query occurrences.
The discipline that compounds
The teams that get the most value from pg_stat_statements run hourly or six-hourly snapshots to a snapshots table, reset only when they have a specific isolating reason, and treat the live view as the most-recent-window picture rather than the canonical historical record. The snapshots table is the canonical record.
The pattern requires a small cron job and a small table. It pays back the first time you need to compare current load against a baseline from last month, or attribute a deploy-time regression to a specific query that the live view no longer shows because it has been evicted from the top-N. Both of those moments come up in production operations multiple times per year, and both depend on having the historical data when you need it.
Read more essays and technical writing at anethoth.com — a notebook on databases, distributed systems, biology, and the engineering that holds the world together.