Postgres GROUPING SETS, ROLLUP, and CUBE: Multi-Level Aggregations in One Query
GROUP BY answers one question. GROUPING SETS, ROLLUP, and CUBE answer several at once in a single pass over the data. Most teams reach for UNION ALL of multiple GROUP BY queries when one of these extensions would have done the work with one table scan and clearer intent.
The standard GROUP BY answers exactly one question: what is the aggregate per group at one level of granularity? Reports usually want more than one answer at once—revenue per product, revenue per category, and revenue overall, all displayed together. The natural-looking solution is several GROUP BY queries combined with UNION ALL. That works, but it scans the source data once per level, and the SQL grows uglier than the reporting question deserves.
GROUPING SETS, ROLLUP, and CUBE are SQL-standard extensions that compute multiple aggregation levels in a single query, with one scan of the source data. Postgres has supported them since version 9.5, and most teams under-use them because the syntax looks unusual and the documentation hides them under a heading most developers do not browse. The features pay off most for reports, dashboards, and any place where you want subtotals next to detail rows without orchestrating multiple queries.
GROUPING SETS: the general primitive
GROUPING SETS lets you specify multiple grouping levels and have Postgres compute aggregates for all of them in one query. The result set is the union of the per-level aggregates, with NULLs in the columns not included in each particular grouping.
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
(region, product),
(region),
(product),
()
);This produces four logical sub-results combined into one output: revenue per region-product combination, revenue per region (with product NULL), revenue per product (with region NULL), and grand total (both NULL). Internally Postgres makes a single pass over the source, computing all the aggregates as it goes. The cost is roughly proportional to one scan plus the cost of maintaining multiple hash tables, not the multiple-scan cost of a UNION ALL approach.
ROLLUP: hierarchical subtotals
ROLLUP is syntactic sugar for a common GROUPING SETS pattern: progressive subtotals from the most-detailed level up through the grand total. ROLLUP (a, b, c) is shorthand for GROUPING SETS ((a,b,c), (a,b), (a), ())—dropping one column at a time from right to left.
The typical use case is a hierarchical report. Sales by year-quarter-month with quarterly and yearly subtotals is ROLLUP (year, quarter, month). Revenue by department-team-employee with team and department subtotals is ROLLUP (department, team, employee). The output order matches the natural reading order of the hierarchy, with subtotals appearing between the detail rows they summarize.
CUBE: every combination
CUBE produces aggregates for every subset of the grouping columns—the full cross-product. CUBE (a, b, c) is shorthand for GROUPING SETS over all eight combinations from (a,b,c) down to (). The number of grouping sets grows as 2^N in the number of columns, so CUBE on four columns produces 16 levels and CUBE on five produces 32.
CUBE is right when the report consumer wants to slice the data along every dimension independently—a pivot-table view where the user can collapse any combination of dimensions. CUBE is wrong when the consumer only needs specific combinations, in which case explicit GROUPING SETS keeps the result smaller and the query intent clearer.
The GROUPING function
The result rows from a multi-level aggregation have NULLs in the columns that are not part of that level's grouping. This causes a problem if the data itself contains NULLs—you cannot tell whether a NULL in the output means "grouped over this dimension" or "this dimension's value was NULL in the source."
The GROUPING function disambiguates. GROUPING(column) returns 1 if the column is part of the current grouping set's NULL-aggregation, 0 if it is a real grouping column. The right pattern in report output is CASE WHEN GROUPING(region) = 1 THEN 'All regions' ELSE region END—replacing the synthetic NULL with a meaningful label.
Performance notes
Multi-level aggregations in one query are usually faster than the equivalent UNION ALL of multiple queries because the source data is scanned once. The cost saving is most visible on large source tables; on small tables the planning overhead can dominate and the multi-query approach is competitive.
Memory usage scales with the number of grouping sets and the cardinality of each. A CUBE over five high-cardinality columns can use substantial memory because every distinct combination needs an entry in some hash table. Setting work_mem appropriately and watching for hash spills in EXPLAIN ANALYZE is the right discipline.
HashAggregate vs. GroupAggregate matters here. Postgres chooses GroupAggregate (which requires sorted input) for small grouping sets and HashAggregate (which builds a hash table per set) for larger ones, with a switchover threshold based on work_mem. The planner's choice is usually right; the override knobs (enable_hashagg, enable_groupagg) exist for diagnostic comparison rather than production tuning.
What this does not do
Multi-level aggregations do not produce a relational result in the cleanest sense—the NULLs in the output break the relational model and require GROUPING-function disambiguation to be interpretable. Reports that consume the output as report data are fine; downstream queries that try to filter or join on the result get awkward fast.
These features do not magically make slow aggregations fast. If a single GROUP BY against the table is slow, the multi-level version will be slow too. Indexes that help one level help the others; predicates that limit the rows scanned help all levels. The speedup vs. UNION ALL is the scan-once factor, not a different cost model.
Across our four products
Our products run on SQLite, which does not support GROUPING SETS or ROLLUP or CUBE natively. The SQLite analog is multiple GROUP BY queries combined with UNION ALL, which is correct but slower and uglier than the Postgres equivalent. For the reports we currently generate in DocuMint (per-customer revenue summaries), CronPing (uptime rollups across monitors and time windows), FlagBit (flag-evaluation breakdowns by environment and rule), and WebhookVault (delivery success rates per endpoint and event type), the multi-query approach is fine at current data volumes. The eventual Postgres migration will let us collapse several of those report queries to single multi-level aggregations.
The reporting endpoints are the natural candidates for adoption: dashboard summaries that show overall numbers alongside per-category breakdowns are exactly the shape ROLLUP was designed for. The internal admin tools that aggregate across customers will use CUBE-style queries for ad-hoc slicing.
The discipline of using GROUPING SETS rather than UNION ALL where the workload supports it is mostly about query clarity. The performance benefit at small data volumes is small. The clarity benefit is large—one query expressing one report intent, rather than three queries combined with UNION ALL that future maintainers have to re-derive the unifying logic for. SQL features that compose better than the ad-hoc alternative usually win on maintenance even when they tie on performance.
Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.