Postgres pg_stat_statements: Finding Your Slowest Queries Before Users Do

Your database has been timing every query since the day you added pg_stat_statements to shared_preload_libraries. Most teams never look at the data.

Setup

-- postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000

-- Restart Postgres, then:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

The restart is required. The extension registers itself during shared memory initialization — no hot-load path exists.

The columns that matter

The view accumulates data across all connections since the last reset. These are the useful ones:

calls — how many times this query pattern ran
mean_exec_time — average wall-clock execution time in milliseconds. The number you actually care about.
total_exec_time — calls × mean. Tells you where the cumulative load is.
stddev_exec_time — variance in execution time. High stddev next to high mean means some executions are catastrophically slow.
rows — total rows returned. Useful for queries returning far more data than expected.

Finding your slowest queries

SELECT
  round(mean_exec_time::numeric, 2) AS mean_ms,
  calls,
  round(total_exec_time::numeric, 0) AS total_ms,
  left(query, 100) AS query_preview
FROM pg_stat_statements
WHERE calls > 10
ORDER BY mean_exec_time DESC
LIMIT 10;

The calls > 10 filter removes one-off queries that appear expensive but happen once. The pattern to look for: high mean_ms, high calls, and high total_ms together. That's a slow query running constantly.

The stddev signal

A query with mean_exec_time of 5ms and stddev_exec_time of 80ms is not a 5ms query. It's bimodal — sometimes fast, occasionally taking hundreds of milliseconds. That variance usually comes from lock contention, autovacuum interference, or plan instability. Mean alone misses it.

Reset discipline

Call pg_stat_statements_reset() after fixing a slow query so new baselines don't drown in old data. Don't clear blindly — you lose evidence for every query you haven't fixed yet. A weekly reset schedule after deployments is reasonable for production.

track_planning and track_utility

pg_stat_statements.track_planning = on records planning time separately from execution. On schemas with frequent DDL changes, planning time can be non-trivial. Off by default. Enable it only if you suspect planner time is the actual bottleneck.

track = all (the default is top) captures queries inside stored procedures and functions. If you're using PL/pgSQL, set this to get full coverage.

Query normalization

Literal values become $1, $2, etc. WHERE id = 42 and WHERE id = 99 appear as the same row. This is correct behavior — you want per-pattern statistics. But it means the actual parameter values are gone. Intentional.

What it does not show

Execution plans. A query can have mean_exec_time of 200ms without telling you whether it's a sequential scan or a hash join gone wrong. Use EXPLAIN (ANALYZE, BUFFERS) on the normalized form. It doesn't show per-connection breakdown either — you see aggregate stats across all callers. Pair with pg_stat_activity if you need to trace load to a specific application.

The data has been there the whole time. Most outages have a slow-query story underneath them. This view is where you find it before users do.

Building something? Prove it.

builds.anethoth.com — public build dossiers for software in progress.