Postgres pg_wait_sampling: Profiling What Your Database Is Waiting For

pg_stat_activity shows the wait events for currently-active sessions at the moment you query it. pg_wait_sampling captures them continuously by sampling at high frequency and aggregating. The view that results is one of the most useful diagnostic surfaces for a busy database.

The pg_stat_activity view exposes the wait_event_type and wait_event columns for every active session, which tells you what each session is waiting on at the instant you ran the query. That is useful for incident diagnosis when something is currently slow. It is not useful for understanding what your database has been waiting on across the last hour, day, or week. The pg_stat_activity counters are not cumulative for wait events: they reflect the current state of each session and are overwritten constantly. To get a distribution, you need to sample.

pg_wait_sampling is a Postgres extension that does the sampling for you. It is maintained by Postgres Pro and packaged in most distributions. Installed and configured, it samples pg_stat_activity at a configurable frequency (default 10 ms) and aggregates the wait events into a view you can query. The aggregation is what makes it useful: instead of one snapshot of what is happening now, you get a histogram of what has been happening over a window.

What pg_wait_sampling exposes

The extension provides three views. pg_wait_sampling_current shows the most recent sample for each backend, which is similar to a slightly delayed pg_stat_activity but partitioned by sample time. pg_wait_sampling_profile shows the cumulative count of how many times each (pid, query_id, event_type, event) tuple has been sampled since the last reset. pg_wait_sampling_history shows a rolling window of recent samples for analysis.

The profile view is the one you read most often. A query like SELECT event_type, event, count(*) FROM pg_wait_sampling_profile GROUP BY 1, 2 ORDER BY count(*) DESC LIMIT 20 gives you the top wait events across the sampling period. If 60 percent of samples show ClientRead, your bottleneck is application-side. If 60 percent show IO/DataFileRead, your bottleneck is disk. If 60 percent show Lock/transactionid, your bottleneck is row-lock contention. The diagnostic is direct.

Configuration

The extension has to be added to shared_preload_libraries and the cluster restarted. Once loaded, the configuration knobs that matter are pg_wait_sampling.profile_period (default 10 ms), pg_wait_sampling.history_period (default 10 ms), pg_wait_sampling.history_size (default 5000 samples retained in memory), and pg_wait_sampling.profile_pid (default on, includes pid in the aggregation key).

The profile_period is the most consequential knob. Setting it too low (1 ms or below) increases overhead and produces samples that are not statistically independent because the sampler runs more often than the work being sampled. Setting it too high (100 ms or above) loses resolution for short waits. The 10 ms default is the right starting value for most workloads.

The profile_pid knob is worth flipping off for long-running production use. With it on, the aggregation key includes the pid, which means a database with many short-lived backends accumulates many distinct keys and the profile view grows large. With it off, samples are aggregated across pids and the view stays compact at the cost of losing per-pid attribution.

Common diagnostic patterns

The first pattern is identifying whether the database is the bottleneck. If most samples show ClientRead or Activity/ClientRead, the backend is waiting on the client to send the next query. The database is idle most of the time and the slowness is application-side. If most samples show CPU (the special event that indicates the backend is on-CPU, not waiting), the database is CPU-bound. If most samples show IO or Lock waits, the database is constrained by disk or contention.

The second pattern is finding the specific wait events that account for the majority of wait time on slow queries. By joining pg_wait_sampling_profile to pg_stat_statements via query_id (in versions that support it), you can attribute waits to specific queries. The diagnostic that emerges is "query X is slow because it spends 80 percent of its time waiting on Lock/relation," which is more directly actionable than "query X is slow."

The third pattern is monitoring long-running queries and seeing when their wait pattern changes. A query that normally shows mostly CPU and IO/DataFileRead waits and suddenly shows mostly Lock/relation waits has hit a contention pattern that did not exist before. The change in wait distribution is often the first signal of a workload change that is about to cause an incident.

What pg_wait_sampling does not show

The extension does not show wait events for sessions that have already finished. The sampling happens at fixed intervals and short queries can complete between samples without being captured. The histogram is therefore biased toward long-running queries. For workloads dominated by short queries, the extension undersamples and pg_stat_statements is a better diagnostic.

The extension does not show why a backend is waiting on a specific lock or IO operation. It tells you the wait event but not the cause. For lock waits, the diagnostic has to be completed by joining to pg_locks. For IO waits, the diagnostic has to be completed by looking at the query plan and the buffer cache state.

The extension does not capture wait events on standby servers in a way that is correlated with the primary. Streaming replicas have their own wait patterns driven by replay rather than by the primary's workload, and the extension samples each independently.

The deeper observation

Wait-event sampling is one of the cases where a small operational addition produces a large diagnostic improvement. The Postgres core does the work of identifying what each backend is waiting on, but it does not aggregate that information across time. pg_wait_sampling does the aggregation. The view that results is one of the most useful diagnostic surfaces for a busy database, and it is underused because the extension is not installed by default. Adding it before the first incident pays back in proportion to how many incidents come.


Read more essays and technical writing at anethoth.com — a notebook on databases, distributed systems, biology, and the engineering that holds the world together.