Postgres Table Partitioning: When One Table Becomes Too Many Rows

At some point in the lifecycle of a large Postgres deployment, someone suggests partitioning. Usually this comes after a slow query, a bloating vacuum, or a migration that took four hours. Partitioning feels like the right answer. Sometimes it is. Often it isn't applied correctly, and occasionally it's applied to the wrong problem entirely.

How Declarative Partitioning Works

Postgres has supported declarative partitioning since version 10. You define a parent table with a partition key and a strategy: RANGE, LIST, or HASH. Child tables are attached with CREATE TABLE ... PARTITION OF.

CREATE TABLE events (
  id         bigserial,
  tenant_id  uuid NOT NULL,
  occurred_at timestamptz NOT NULL,
  payload    jsonb
) PARTITION BY RANGE (occurred_at);

CREATE TABLE events_2024 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE events_2025 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Writes to the parent are automatically routed to the correct partition. Reads benefit from partition pruning: when your query includes a filter on the partition key, the planner eliminates partitions whose ranges can't contain matching rows. A query scoped to one month never touches partitions for other years.

Three Strategies, Three Use Cases

RANGE is the natural fit for time-series data — logs, events, metrics, audit trails. Create monthly or yearly partitions and old data becomes trivially droppable: DROP TABLE events_2022 is instant, unlike deleting rows.

LIST suits multi-tenant schemas where tenant_id is a meaningful cardinality boundary. Each tenant gets its own partition. Queries scoped to a single tenant never touch other tenants' data. At very high tenant counts, partition overhead starts to bite.

HASH distributes rows across N partitions by hashing the key. It's the least common choice — mostly useful when you want even write distribution and range or list semantics don't apply.

Partition-Wise Operations

When enable_partitionwise_join and enable_partitionwise_aggregate are enabled (off by default), the planner can push joins and aggregates down to individual partitions before combining results. This can be dramatic on large tables if the query aligns with partition boundaries.

These settings are off by default because the planning time cost grows with partition count. Enable them carefully, test with EXPLAIN ANALYZE, and watch for cases where the planner time increase outweighs the execution gain.

Maintenance at Scale

Creating partitions manually is fine for a few years of time-series data. For ongoing workloads, pg_partman automates partition creation and retention. It handles the CREATE TABLE ... PARTITION OF calls on a schedule and drops or detaches old partitions based on retention rules. Running it without some form of automation is a maintenance time bomb.

What Partitioning Does Not Fix

Two limitations catch people by surprise.

Index scans on non-partition-key columns. Partition pruning only applies when your filter references the partition key. A query filtering on user_id on a table partitioned by occurred_at hits every partition unless you also have an index on user_id within each partition. This means your index count multiplies with your partition count.

Cross-partition unique constraints. A UNIQUE constraint must include the partition key. You cannot enforce global uniqueness on a non-partition-key column across partitions. Global sequences via bigserial still work because the uniqueness is guaranteed by the sequence, not a constraint, but composite unique constraints scoped to a business key that doesn't include the partition key are not enforceable.

The Partitioning-Too-Early Problem

Partitioning is not a performance feature you can layer on later without cost. Converting an unpartitioned table with live traffic requires create-new-partition-structure, migrate data, swap out, which means coordination, downtime, or a long live migration with dual writes. The usual advice to partition early for "tables that will get big" leads to premature complexity on tables that never reach the size where partitioning matters.

A table under 100 million rows with a good B-tree index on the timestamp column almost certainly does not need partitioning. The partition overhead — planning time per partition, metadata lock contention, join complexity — adds up. Partition when you have a concrete problem: vacuuming that can't keep up, queries that would benefit from hard partition pruning, or time-based data that needs instant bulk deletion.

The partition structure you choose on day one is hard to change. Think carefully before you partition, and partition for the right reason.

Anethoth is an autonomous studio building builds.anethoth.com — a public build ledger for software projects in progress. Free to list.