Postgres Lock Contention: How to Find It and the Patterns That Avoid It
Lock contention shows up as inexplicable slow queries, queries that get faster when traffic decreases, and tail-latency spikes that correlate with nothing in your application logs. The detection is mostly knowing where to look, and the avoidance is mostly knowing what kinds of transactions hold
Lock contention is one of the most frustrating classes of database problem because the symptoms are indirect. A query that normally takes 10ms takes 5 seconds with no plan change, no data growth, no obvious correlate in CPU or I/O. The query plan looks identical. The same query run against a quiet replica is fast. The bug is real but the standard query-tuning playbook does not touch it.
The reason is that the query is not slow because it is doing more work; it is slow because it is waiting. Some other transaction holds a lock the query needs, and the query sits in a wait queue until the lock is released. The work the query is doing might be milliseconds; the waiting is everything. This pattern shows up across every database with concurrency control, and we have hit it in production on DocuMint, CronPing, FlagBit, and WebhookVault at different points. The detection and the avoidance are different skills, and most teams underinvest in both.
The lock hierarchy you need to know
Postgres has eight table-level lock modes, but for practical purposes there are three you need to internalize. ACCESS SHARE is what plain SELECT takes; it conflicts only with ACCESS EXCLUSIVE, which is what ALTER TABLE and DROP TABLE take. ROW EXCLUSIVE is what INSERT, UPDATE, and DELETE take; it conflicts with SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE. SHARE UPDATE EXCLUSIVE is what VACUUM (non-FULL), CREATE INDEX CONCURRENTLY, and ANALYZE take; it conflicts with most schema-change locks but allows reads and writes.
The non-obvious fact is that SELECT and INSERT/UPDATE/DELETE do not block each other at the table level. The contention between them happens at the row level, and is the most common source of lock waits in production. The row-level lock taken by an UPDATE blocks another UPDATE of the same row, blocks SELECT ... FOR UPDATE of the same row, but does not block ordinary SELECT.
At the table level, the contention you have to worry about is mostly DDL. An ALTER TABLE takes ACCESS EXCLUSIVE, which conflicts with every other lock mode, including the ACCESS SHARE taken by ordinary SELECT. A migration that runs ALTER TABLE my_busy_table ADD COLUMN ... against a table with active traffic will block every query against that table for the duration of the alter. The alter itself may complete in microseconds (adding a nullable column without default is fast), but it has to wait for every existing query against the table to finish, and every new query has to wait for the alter to acquire its lock.
How to find what is blocking what
The minimum viable diagnostic is the pg_stat_activity view joined to itself through pg_blocking_pids(). A query like:
SELECT pid, usename, query_start, state, wait_event_type, wait_event,
pg_blocking_pids(pid) AS blocking_pids,
LEFT(query, 200) AS query
FROM pg_stat_activity
WHERE state != 'idle' AND wait_event_type = 'Lock';This returns every session currently waiting on a lock, what kind of wait it is on, and which session is holding the lock that is blocking it. The blocking session might itself be waiting on another lock, producing a chain. The trick is that the session at the head of the chain is often idle in transaction, which means it is not running a query but has not committed or rolled back. That is the most common cause of mysterious lock contention: a session that started a transaction, ran some statements, and then was abandoned by the application without commit or rollback.
The idle in transaction pattern usually means a bug in the application: a connection-pool eviction that left a transaction open, a request handler that returned without ending the transaction, or a long-running computation between the first and last statements of a transaction. The fix is at the application level. The mitigation that buys time is idle_in_transaction_session_timeout, which kills sessions that have been idle in a transaction longer than a configurable threshold. Setting it to 60 seconds catches most of these cases without disrupting legitimate workloads.
The deeper diagnostic is pg_locks, which shows every lock currently held or requested. Joining pg_locks against pg_stat_activity and filtering by granted = false gives a precise picture of what is waiting on what. This is overkill for most incidents but invaluable when the simple pg_blocking_pids view is not enough, particularly for complex lock dependencies involving advisory locks or row-level locks.
The lock_timeout safety net
Setting lock_timeout to a small value before any DDL is one of the highest-leverage configuration changes in operational Postgres. A migration that times out trying to acquire a lock is much better than a migration that waits indefinitely while blocking every query in production.
The pattern is SET lock_timeout = '5s'; at the start of a migration session, followed by the DDL. If the DDL cannot acquire its lock within five seconds, it errors out instead of blocking traffic. The migration can be retried later; the operational damage is bounded. We use this on every migration across our products, and it has caught two production incidents that would have been customer-visible outages without it.
The same pattern applies to application-level statements that are known to be at risk of lock contention. A background job that does maintenance work on a busy table should set a short lock_timeout at the session level, so that if it cannot get the lock quickly, it fails fast and can be rescheduled rather than blocking traffic.
Patterns that avoid the contention
The first pattern is short transactions. The longer a transaction holds locks, the larger the window for contention. Reading data, computing decisions, and waiting on external services should all happen outside the transaction. The transaction should be the minimum write-and-commit window. A 500ms transaction that touches a hot row will produce contention; a 5ms transaction touching the same row will not.
The second pattern is consistent lock acquisition order. Deadlocks happen when transaction A holds lock X and waits for lock Y, while transaction B holds lock Y and waits for lock X. The fix is that every transaction acquires locks in the same order — usually by table primary key, alphabetical table name, or some other globally-consistent ordering. The bug is usually that one code path acquires locks in a different order than the others, often because the developer was not thinking about locking at all.
The third pattern is avoiding hot rows. A single row that every transaction needs to update is a serialization point. The common cases are global counters, per-tenant counters with hot tenants, and queue tables where every consumer claims the next row. The fixes are to split the hot row into many rows (a counter sharded across N rows that you sum on read), to use SELECT ... FOR UPDATE SKIP LOCKED on queue tables so that multiple consumers can claim different rows in parallel, or to do the writes asynchronously through a queue rather than synchronously on the request path.
The fourth pattern is using NOWAIT or SKIP LOCKED on row locks instead of plain FOR UPDATE. NOWAIT errors immediately if the row is locked, which lets the application back off and retry rather than waiting. SKIP LOCKED returns rows that are not currently locked, which is the right pattern for queue consumers where any row is as good as any other.
The operational signals
Three signals to monitor on a busy Postgres. First, the count of sessions in wait_event_type = 'Lock', sampled every few seconds. A small steady number is normal; a sudden climb is the leading indicator of contention. Second, the longest-running transaction duration, which catches the abandoned-transaction pattern before it cascades. Third, the deadlock count from pg_stat_database, which should be near zero in a healthy system; a steady stream of deadlocks indicates a real bug somewhere in the application's locking order.
The deeper observation
Lock contention is the database doing exactly what concurrency control is supposed to do: making sure that conflicting writes are serialized correctly. The problem is not the locks; it is that the application is asking for more concurrent access to the same data than the schema and the transaction structure can support. The fix is almost always at the application level — shorter transactions, finer-grained data, less contention by design — rather than at the database level. The tools you need are the ones that show you what locks are being taken and held, and the most expensive operational mistake is not knowing how to ask the database what it is waiting for.