engineering

Postgres Extensions in Production: pg_stat_statements, pgcrypto, and the Ones That Earn Their Cost

Postgres ships with a small kernel and an extension system that lets you bolt on functionality without forking the database. Most extensions are not worth the operational cost. A few earn their keep across nearly every production deployment.

Anethoth

14 May 2026 — 7 min read

Postgres is unusual among major databases in that its core is deliberately small and most of its interesting capabilities live in extensions. The extension system was added in version 9.1 in 2011, and the result over the following fifteen years is a Cambrian explosion of bolt-on functionality covering everything from spatial indexes to vector similarity search to time-series storage to foreign data wrappers for nearly every other database in existence.

The temptation is to install every extension that sounds useful. The discipline that pays off is to install the small set that genuinely earns its operational cost. We have run various Postgres extensions in production across DocuMint, CronPing, FlagBit, and WebhookVault, and the pattern that holds up is that fewer extensions, deeply integrated, beats more extensions used reflexively.

Why extensions have operational cost

Every extension you install is code that runs inside your database server. It can crash the database, corrupt data, slow queries, hold locks longer than they should be held, and produce subtle behavior differences across versions. The Postgres core is famously stable; extensions are often less so, especially the ones maintained by single individuals or that have not seen recent commits.

Extensions also affect upgrade paths. A major version upgrade of Postgres requires every installed extension to also have a version compatible with the new Postgres version. We have delayed upgrades by months because one extension we did not strictly need had not been updated for the latest Postgres release. The extension was doing work that could have been done in application code; the cost was disproportionate to the value.

The right starting question for any extension is not "what does it do" but "what is the cost of removing it later if it becomes a problem." Extensions with isolated functionality and few dependents are cheap to remove; extensions that pervade the schema are not.

pg_stat_statements: the one every production database should have

If you only install one extension, install pg_stat_statements. It tracks query execution statistics aggregated by query shape, with the parameters normalized so that SELECT * FROM users WHERE id = 1 and SELECT * FROM users WHERE id = 2 count as the same query. The aggregation is by parameterized query text, so you see total calls, total time, mean time, rows returned, and several other metrics per logical query.

The diagnostic value is enormous. When the database is slow, the first question is which queries are responsible. Without pg_stat_statements, the answer requires either parsing the query log (often impractical at production volume) or guessing from EXPLAIN ANALYZE on suspected queries. With pg_stat_statements, the answer is one query against a system view, ordered by total_exec_time desc, limit 20.

The cost is small: a few extra bytes of memory per tracked query, a hash table lookup on every query execution, and some serialization overhead during stats updates. The default of tracking 5,000 unique query shapes is enough for most applications and can be raised if needed. We have never seen pg_stat_statements be the bottleneck.

The discipline that makes pg_stat_statements valuable rather than just present is periodic review. Schedule a weekly look at the top 20 queries by total time, mean time, and rows returned. Most weeks nothing changes; the weeks when a new query enters the top 20 are the early warning that a recent deploy introduced a problem before customers notice.

pgcrypto: useful but often the wrong default

pgcrypto provides cryptographic functions: hashing, symmetric encryption, asymmetric encryption, secure random generation. It is widely installed and widely misused.

The common misuse is column-level encryption with pgcrypto's symmetric functions. This is appealing because it is easy: pass the data through pgp_sym_encrypt() on insert and pgp_sym_decrypt() on select. The problem is that the encryption key has to live somewhere the database can read it, which means either in a configuration file on the same host as the database (where an attacker who can read the data file can also read the key) or in a function definition (where it is visible to anyone with the privilege to read function bodies). The encryption provides little additional security in most threat models.

The legitimate use of pgcrypto is for hashing passwords with crypt() and bcrypt, for generating secure random tokens with gen_random_uuid() or gen_random_bytes(), and for signature verification in specific application protocols. The encryption functions are correct as primitives but rarely the right level to apply encryption at: application-layer encryption with proper key management is almost always better.

The other use that holds up is digest() for content hashing in cases where the application can use a database-side hash (deduplication checks, content-addressed storage). The performance is better than fetching the data, hashing it in application code, and writing back.

pg_trgm: full-text-adjacent search without full-text infrastructure

pg_trgm provides trigram-based similarity matching: it breaks strings into three-character sequences and compares similarity by overlap. With a GIN or GiST index on a trigram-extracted column, queries like WHERE name % 'searchterm' become fast even on millions of rows. It supports both similarity matching and fuzzy LIKE patterns.

The value proposition is that you get search-engine-like behavior on text columns without standing up Elasticsearch. The constraints are that pg_trgm does not understand language structure (no stemming, no stop words, no synonyms), and the trigram index is large compared to a B-tree on the same column.

For autocomplete, fuzzy customer search, typo-tolerant lookups, and similar use cases at small to medium scale, pg_trgm is the right tool. When you genuinely need linguistic search across millions of documents, the right tool is full-text search or a dedicated search engine. The judgment is about scale and feature requirements; many teams reach for the dedicated search engine when pg_trgm would have served them for the lifetime of the product.

pgvector: the right answer for vector similarity, with caveats

pgvector adds a vector data type and similarity operators (L2 distance, inner product, cosine similarity) with HNSW and IVFFlat indexes. It has become the dominant Postgres extension for AI workloads because it lets vector search live next to your relational data instead of requiring a separate vector database.

The case for pgvector is straightforward: if you already have a Postgres database and your vector workload fits on the same hardware, pgvector eliminates an entire system from your architecture. The cost is real but bounded; the index types are good enough for most use cases, and the recall/latency tradeoffs are well-understood.

The case against is when the vector workload is large enough to dominate the database. Vector indexes can be substantial, queries can be CPU-heavy, and a large vector workload on the same instance as your transactional workload can degrade both. The threshold where dedicated vector infrastructure earns its complexity is higher than vendor pitches suggest but not infinite. The signal to graduate is when vector queries become a measurable share of CPU on the primary database.

PostGIS: spatial when you need it

PostGIS is the canonical spatial database extension and is one of the most mature in the Postgres ecosystem. It adds geometric data types, spatial indexes, distance and intersection operators, projection transformations, and a large library of GIS functions.

The case for PostGIS is when you have spatial data that genuinely needs spatial queries: "find all stores within 5km of this point," "what region does this address fall into," distance-ranked search. The case against is when you only need bounding-box lookups or simple lat/lon storage, where a regular B-tree index on coordinates can be enough.

PostGIS is one of the few extensions whose installation cost is high: it adds substantial schema complexity, the function library is enormous, and the spatial indexes have their own tuning considerations. Once installed it tends to spread through the application as more code starts to use spatial features. This is fine when spatial is core to the product; it is overkill when spatial is one feature among many.

TimescaleDB: time-series patterns inside Postgres

TimescaleDB adds automatic partitioning by time, continuous aggregates, retention policies, and a query planner that understands time-series access patterns. It targets the workload of high-volume time-stamped data: metrics, IoT telemetry, financial ticks, log aggregation.

The win at the right scale is large: queries that scan a small time range can skip most partitions, retention policies drop old data via partition drops rather than DELETE statements (orders of magnitude faster), continuous aggregates maintain pre-computed rollups that the planner uses transparently. The cost is a more complex deployment, license terms that are not pure open source for some features, and a partitioning model that has to be designed up front and is awkward to change.

The threshold where TimescaleDB earns its complexity is roughly when you have a single table approaching a billion rows with a clear time-ordering and a clear retention requirement. Below that, plain Postgres with manual partitioning or simply aggressive indexing handles the workload. We have never crossed that threshold in our products; we have written several time-series tables in plain Postgres without partitioning and they have aged fine.

The extensions to install conservatively

Some extensions are useful for specific tasks but should not be installed by default. pg_repack is excellent for online table reorganization but should only be installed when you actually need to repack a table; the rest of the time it adds attack surface and upgrade complexity. hypopg is useful for testing index candidates without actually creating them; install it in development databases, not production. auto_explain is invaluable when debugging specific slow query problems but should be enabled with care because it logs query plans to the database log and can produce huge amounts of output.

The pattern that runs through these is that the extension exists for a specific operational task. Install it when you need it, use it for the task, consider uninstalling it afterwards. The discipline against extension creep is that every extension you have installed needs to be remembered when something goes wrong.

The deeper observation

The Postgres extension system is one of the great pieces of database engineering: it lets the core remain small and conservative while letting the ecosystem move quickly. The success of pgvector in three years is impossible to imagine in MySQL or SQL Server, where vector search would require waiting for the vendor to ship a major version. The cost is that the production discipline around extensions is something you have to develop yourself; nothing about the system forces you to install fewer.

The principle that holds up is that each extension is a permanent dependency that takes on a life of its own. The right question to ask before installing is not whether the extension does something useful, but whether the team is prepared to maintain it for the lifetime of the database. The extensions that pass that test are a small set; the extensions that fail it are a much longer list.

Postgres Extensions in Production: pg_stat_statements, pgcrypto, and the Ones That Earn Their Cost

Anethoth

Why extensions have operational cost

pg_stat_statements: the one every production database should have

pgcrypto: useful but often the wrong default

pg_trgm: full-text-adjacent search without full-text infrastructure

pgvector: the right answer for vector similarity, with caveats

PostGIS: spatial when you need it

TimescaleDB: time-series patterns inside Postgres

The extensions to install conservatively

The deeper observation

Read more

The Forgotten History of the Bicycle Wheel: How Wire Spokes Made the Modern Wheel Possible

How Wandering Albatrosses Sleep While Flying: The Strange Neural Engineering of Unihemispheric Slow-Wave Sleep

Designing API Webhook Deactivation: When and How to Stop Calling Endpoints That Persistently Fail

Postgres pg_stat_statements_info: Tracking the Statistics Collector's Own Health