Postgres pg_stat_activity: Diagnosing What Your Connections Are Actually Doing
When the database feels slow but no individual query is slow, the answer is usually visible in pg_stat_activity. Reading connection states and wait events is the diagnostic skill that separates intermediate from senior Postgres operators.
The pg_stat_statements extension shows what queries are doing in aggregate. The pg_stat_activity view shows what every individual connection is doing right now. The two are complementary: pg_stat_statements tells you which queries are expensive in total, and pg_stat_activity tells you which queries and connections are currently consuming resources or blocked waiting for something.
Most production Postgres operators check pg_stat_activity informally when something seems wrong, but few teams build the diagnostic muscle to use it systematically. The view is one of the highest-leverage observability surfaces in Postgres, and reading it well is one of those skills that compounds across every database incident.
What pg_stat_activity contains
One row per connection. The interesting columns: pid identifies the backend process, state reports what the connection is currently doing in one of five values (active, idle, idle in transaction, idle in transaction (aborted), disabled), wait_event_type and wait_event name what the connection is waiting on if anything, query shows the current or last query text, query_start shows when the current statement began, state_change shows when the connection entered its current state, and backend_start shows when the connection was established.
The view filters by current user permissions by default. A connection running as a normal application user sees only its own activity; a superuser or a user granted pg_read_all_stats sees everything. For monitoring, create a dedicated monitoring user with pg_read_all_stats membership and use it for diagnostic queries.
The five connection states
The state column is where most diagnosis starts. active means the connection is executing a query right now. idle means the connection is connected but waiting for the client to send something. idle in transaction means the connection has BEGIN but is waiting for the client to send the next statement; this is the dangerous state because the open transaction holds locks and snapshots that prevent vacuum from cleaning up. idle in transaction (aborted) is the same but the transaction has hit an error and is waiting for the client to ROLLBACK. disabled means session-level tracking was turned off, which is rare in normal use.
The state-count summary query is the first thing to run during an incident: SELECT state, count(*) FROM pg_stat_activity GROUP BY state. A healthy production database has most connections in idle (waiting for client work) and a small number in active (executing queries). A database under load has more active and fewer idle. A database with application-side connection leaks has many connections that are idle but never released; the application is holding them in its pool. A database with the canonical idle-in-transaction problem has measurable idle in transaction counts and the longest-idle-in-transaction connections are blocking vacuum.
Reading wait events
When a connection is active but slow, the wait_event_type and wait_event columns tell you what it is waiting on. The categories that matter most: Lock means waiting for a database lock (row, table, transaction, or advisory). LWLock means waiting for a Postgres internal lock, usually around shared buffers or WAL. IO means waiting on disk read or write. Client means waiting on the application to send or receive data. IPC means inter-process communication, usually around parallel query workers. A null wait_event_type means the connection is making forward progress.
The Lock waits are usually what application-side users notice. The specific lock type (RowShareLock, RowExclusiveLock, ShareUpdateExclusiveLock, etc.) tells you what's blocking. The pg_blocking_pids(pid) function returns the PIDs that are blocking a given connection, and chaining this with a self-join on pg_stat_activity produces a who-is-blocking-whom report. The chain is sometimes deep: connection A is blocked by B which is blocked by C which is idle in transaction, and the actual problem is the idle-in-transaction connection at the end of the chain.
The idle-in-transaction diagnostic
The longest idle-in-transaction connections are usually the most consequential operational problem in pg_stat_activity. The query: SELECT pid, now() - state_change AS idle_duration, query FROM pg_stat_activity WHERE state = 'idle in transaction' ORDER BY idle_duration DESC LIMIT 10. The output tells you which connections have open transactions that have been idle for an unusual length of time.
The common causes: application code that runs a query, does heavy in-memory work, and then runs another query in the same transaction; framework defaults that wrap every request in a transaction even for read-only paths; long-running migrations that leave open transactions during their slow steps. Each of these is fixable at the application level, and pg_stat_activity is how you find them. The Postgres safety net is idle_in_transaction_session_timeout, which kills connections that have been idle in transaction for more than a configured duration. Set it to a value that catches bugs but does not interfere with normal long-running operations; 30 minutes is a reasonable default for typical SaaS workloads.
The long-running query diagnostic
The complementary query for active long-running statements: SELECT pid, now() - query_start AS duration, state, wait_event_type, wait_event, query FROM pg_stat_activity WHERE state != 'idle' ORDER BY duration DESC LIMIT 10. The output shows what is currently running and how long it has been running. The wait_event columns are critical here: a query running for 10 minutes with no wait_event is doing CPU work and likely needs query optimization; a query running for 10 minutes with Lock wait events is blocked and likely needs the blocker resolved; a query running for 10 minutes with IO waits is reading a lot of data and likely needs index work.
The pg_cancel_backend(pid) function asks a backend to cancel its current query; the harsher pg_terminate_backend(pid) kills the connection entirely. Both are necessary tools but neither should be reflexive. Cancel a query when you understand why it is running and the operational cost of waiting is higher than the cost of failing the request; terminate a connection when cancel is not enough, which is usually because the backend is stuck in a non-cancellable state like waiting on disk.
Connection pool diagnosis
The application-side connection pool is often the source of unexpected activity patterns. Symptoms visible in pg_stat_activity: many connections from the same application_name and client_addr with mostly idle states (oversized pool), connections with very old backend_start times (connection lifetime not being enforced), connections in idle in transaction (application-side transaction-leaking bug), connections in unexpected states for the application logic (orphaned transactions from earlier failed deploys).
The application_name and client_addr columns are how you correlate connection-level state with application behavior. Set application_name in the connection string to something diagnostic (service name plus pod identifier in Kubernetes deployments) so that pg_stat_activity output is immediately attributable. The standard pattern: SET application_name = 'documint-api-prod-pod-abc123' at connection acquisition time, so every backend identifies its source even after the application has moved to a different host.
The pg_stat_activity sampling pattern
The view shows current state, not historical state. For trend analysis, take periodic snapshots and store them in a separate table or external monitoring system. A typical sampling implementation runs the view through a cron job every 30 seconds and inserts rows into a snapshots table, partitioned by day for cheap deletion. The schema captures pid, state, wait_event_type, wait_event, query (hashed or truncated for retention), and observation_time.
The retained snapshots are surprisingly diagnostic during post-incident analysis. The pattern of how state distributions changed during the incident, what wait events dominated, which queries were running during the peak, all become readable from the snapshots in ways that the live view did not capture because nobody was watching at the moment. The cost is modest: 100 connections sampled every 30 seconds for 7 days is roughly 2 million rows, which fits in a tenth of a gigabyte and compresses well.
What pg_stat_activity does not show
The view shows the most recent query each connection executed, but does not show the history of queries within a transaction. If you see a connection in idle-in-transaction state with query showing the last statement it ran, you do not know what came before in the same transaction. For that, you need application-side request logging or transaction-level audit infrastructure.
The view also does not show queries that completed quickly. A connection that ran 1000 fast queries in the last second appears as idle with the most recent query, not as active with the workload it has been doing. The aggregate workload picture is what pg_stat_statements provides, complementing the per-connection current-state picture from pg_stat_activity.
Across our four products
We run DocuMint, CronPing, FlagBit, and WebhookVault on SQLite, which has no comparable runtime-introspection view. The equivalent in SQLite is reading the lock state via the C API or watching application-side metrics for connection-pool depth and query duration. The lack of pg_stat_activity-like introspection is one of the costs we pay for SQLite's operational simplicity; when we eventually migrate to Postgres for any product, the diagnostic surface this view provides will be one of the largest qualitative gains.
The deeper observation is that databases under load fail in connection-level ways that are not visible at the query level. A query that runs in 5 milliseconds is fine in isolation but is a problem if 50 connections are running it simultaneously and waiting on the same lock. The pg_stat_activity view is the diagnostic surface for this class of problem, and reading it well requires building the muscle on small incidents so that the skill is available when a large one happens.