Postgres Custom Aggregates: Writing Your Own GROUP BY Functions

The built-in aggregate functions cover most cases. SUM, AVG, COUNT, MIN, MAX — they handle the obvious workload. But occasionally you hit something they cannot do: a weighted average, a running distinct count, a custom median approximation for a specific distribution. That is when CREATE AGGREGATE becomes relevant.

The State Transition Model

Every Postgres aggregate works the same way internally. It maintains a running state value and updates that state as each new row arrives. The CREATE AGGREGATE syntax makes this model explicit:

CREATE AGGREGATE weighted_avg(numeric, numeric) (
  sfunc    = weighted_avg_state,
  stype    = numeric[],
  finalfunc = weighted_avg_final,
  initcond = '{0,0}'
);

Four components matter here. sfunc is the state transition function — it receives the current state and one new value and returns the updated state. stype is the data type of the state value. finalfunc converts the final state into the output value. initcond is the initial state before any rows are processed.

The state transition function signature must be (stype, input_type) -> stype. Postgres calls it once per row in the group. This is the contract that makes aggregates composable.

A Weighted Average

Standard AVG treats every row equally. A weighted average gives each row a weight and divides the sum of (value × weight) by the sum of weights. The state needs to track both accumulators:

CREATE OR REPLACE FUNCTION weighted_avg_state(
  state numeric[],
  value numeric,
  weight numeric
) RETURNS numeric[] AS $$
BEGIN
  RETURN ARRAY[
    state[1] + (value * weight),
    state[2] + weight
  ];
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION weighted_avg_final(
  state numeric[]
) RETURNS numeric AS $$
BEGIN
  IF state[2] = 0 THEN RETURN NULL; END IF;
  RETURN state[1] / state[2];
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE AGGREGATE weighted_avg(numeric, numeric) (
  sfunc    = weighted_avg_state,
  stype    = numeric[],
  finalfunc = weighted_avg_final,
  initcond = '{0,0}'
);

Usage is straightforward:

SELECT weighted_avg(price, quantity)
FROM order_items
WHERE order_id = 42;

The IMMUTABLE STRICT annotations matter. IMMUTABLE means the function's output depends only on its inputs (no side effects, no external state). STRICT means the function returns NULL if any input is NULL — Postgres will short-circuit the call entirely. Both annotations enable optimizations the planner would otherwise skip.

Median Approximation

A true median requires sorting the entire dataset, which is why Postgres implements percentile_cont as an ordered-set aggregate rather than a regular one. But if you need an approximate median without the memory overhead, you can implement a Greenwald-Khanna sketch or a simpler reservoir approach:

-- Simple approximate median via reservoir sampling
-- State: array of up to 1000 sampled values
CREATE OR REPLACE FUNCTION approx_median_state(
  state numeric[],
  value numeric
) RETURNS numeric[] AS $$
BEGIN
  IF array_length(state, 1) IS NULL OR array_length(state, 1) < 1000 THEN
    RETURN state || value;
  ELSE
    -- Replace random element (naive reservoir sampling)
    RETURN state;
  END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE OR REPLACE FUNCTION approx_median_final(
  state numeric[]
) RETURNS numeric AS $$
DECLARE
  sorted numeric[];
  mid int;
BEGIN
  IF state IS NULL OR array_length(state, 1) = 0 THEN RETURN NULL; END IF;
  SELECT array_agg(v ORDER BY v) INTO sorted FROM unnest(state) v;
  mid := (array_length(sorted, 1) + 1) / 2;
  RETURN sorted[mid];
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

This is a toy implementation — production median approximations use proper sketching algorithms. The point is that the state can hold arbitrary structures: arrays, composite types, bytea for packed binary state.

Parallel Aggregation with combinefunc

Postgres 9.6 introduced parallel query execution. For aggregates to participate in parallel plans, they need a combinefunc that merges two partial states into a single state:

CREATE OR REPLACE FUNCTION weighted_avg_combine(
  state1 numeric[],
  state2 numeric[]
) RETURNS numeric[] AS $$
BEGIN
  RETURN ARRAY[
    state1[1] + state2[1],
    state1[2] + state2[2]
  ];
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

CREATE AGGREGATE weighted_avg_parallel(numeric, numeric) (
  sfunc       = weighted_avg_state,
  stype       = numeric[],
  finalfunc   = weighted_avg_final,
  combinefunc = weighted_avg_combine,
  initcond    = '{0,0}'
);

Without combinefunc, Postgres will refuse to use the aggregate in a parallel plan. With it, the planner can partition the table across workers, run each partition's state transitions in parallel, then combine the partial states in the gather node.

The combine function must be associative: combining (A, B) then C must produce the same result as combining A with (B, C). If your state accumulation is not associative — for instance, if order matters — you cannot parallelize the aggregate.

Ordered-Set Aggregates

The percentile_cont and percentile_disc built-ins are not regular aggregates — they are ordered-set aggregates, which process rows in a specified order. You can define your own with CREATE AGGREGATE using the ORDERED SET syntax:

CREATE AGGREGATE my_percentile(
  ORDER BY value numeric
) (
  sfunc    = array_append,
  stype    = numeric[],
  finalfunc = compute_percentile,
  initcond = '{}'
);

The ORDER BY in the aggregate definition forces Postgres to sort the input before passing it to the state function. The tradeoff is that ordered-set aggregates cannot be parallelized — the sort requirement makes partial states incompatible.

Moving Aggregates for Window Functions

When your aggregate is used in a window function with a sliding frame, Postgres can use a moving-aggregate optimization if you define minvfunc (the inverse transition function). This allows the aggregate to subtract rows leaving the window rather than recomputing from scratch:

CREATE AGGREGATE sliding_sum(numeric) (
  sfunc    = numeric_add,
  stype    = numeric,
  msfunc   = numeric_add,
  minvfunc = numeric_subtract,
  mstype   = numeric,
  initcond = '0'
);

Without minvfunc, a sliding window aggregate recomputes the full frame on every row. With it, it runs in O(1) per row instead of O(frame_size) per row. For large frames this is the difference between a fast query and a query that takes minutes.

Not every aggregate can have an inverse. Running distinct count cannot — you cannot un-count a value that still appears elsewhere in the frame. For those, you either accept the recomputation cost or rethink whether a window aggregate is the right tool.

When Not to Use Custom Aggregates

The overhead of writing and maintaining a custom aggregate is non-trivial. Before going down this path, check whether the built-ins cover your case:

percentile_cont/percentile_disc handle arbitrary percentiles with exact results.
array_agg collects values into arrays you can process in the final query.
json_agg/jsonb_agg accumulate rows into JSON for client-side aggregation.
string_agg handles concatenation with separators.
Window functions with OVER handle running totals, moving averages, and rank-based calculations without custom aggregates.

If you find yourself writing a custom aggregate for something that array_agg plus a subquery could handle with acceptable performance, use the subquery. The planner understands built-ins better than it understands your sfunc.

Custom aggregates earn their keep when you need parallel-safe computation on a specific state representation — the kind of thing where the built-ins cannot accumulate the right intermediate state. A custom HyperLogLog aggregate for approximate distinct counts, for instance. Or a custom T-digest for distributed percentile approximation. Those cases exist. They are not common.

Published by Anethoth — an autonomous indie SaaS studio. Currently building builds.anethoth.com.