SQL Window Functions: Analytics Queries Without the Subquery Mess

Window functions were added to standard SQL in 2003 and to most production databases by 2010, yet many engineers still write running totals and ranked-by-group queries as nested subqueries. The patterns that should be in every backend developer's toolbox.

Window functions are the part of SQL most likely to make an experienced developer say "I had no idea you could do that." They allow per-row calculations that reference other rows without collapsing the result set the way GROUP BY does. Running totals, percentile ranks, "first row per group," moving averages, and gap detection all become single-pass single-query operations rather than tangles of subqueries and joins.

We use window functions across DocuMint, CronPing, FlagBit, and WebhookVault for analytics queries that would otherwise require multi-statement transactions or application-side aggregation. They are universally available — PostgreSQL since 8.4, SQLite since 3.25, MySQL since 8.0, SQL Server since 2005 — and yet they remain underused.

The basic syntax

A window function is a regular SQL function followed by OVER (...). The parenthesized clause defines the window — which rows the function looks at when computing the value for the current row. The simplest case is "all rows," with no PARTITION BY or ORDER BY:

SELECT
  id,
  amount,
  SUM(amount) OVER () AS grand_total,
  amount * 1.0 / SUM(amount) OVER () AS percentage
FROM invoices;

Each row keeps its original values and gets two new computed columns. No subquery, no join, no GROUP BY. The grand total appears on every row because the window spans the whole result set.

Partitions: per-group calculations without collapsing rows

Adding PARTITION BY divides the rows into groups and computes the window function separately within each group:

SELECT
  account_id,
  invoice_id,
  amount,
  SUM(amount) OVER (PARTITION BY account_id) AS account_total,
  COUNT(*) OVER (PARTITION BY account_id) AS account_invoice_count
FROM invoices;

Each row keeps its identity but also carries aggregate information about its group. This is the pattern that makes most "show user X's data alongside their group total" reports trivial.

Ordering and frames: running totals and moving averages

Adding ORDER BY inside the window introduces a sequence and an implicit frame. The default frame, perversely, is "from the start of the partition to the current row," which means SUM with ORDER BY produces a running total:

SELECT
  date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY date) AS cumulative_revenue
FROM daily_summaries;

For a moving average, specify the frame explicitly:

SELECT
  date,
  daily_revenue,
  AVG(daily_revenue) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS seven_day_moving_avg
FROM daily_summaries;

The frame specification (ROWS BETWEEN ... AND ...) controls exactly which rows the function sees. Common patterns: UNBOUNDED PRECEDING AND CURRENT ROW for cumulative, N PRECEDING AND CURRENT ROW for backward-looking moving average, CURRENT ROW AND N FOLLOWING for forward-looking.

The ranking functions: ROW_NUMBER, RANK, DENSE_RANK

These three functions assign a number to each row within a partition, ordered by the ORDER BY clause:

SELECT
  account_id,
  invoice_id,
  amount,
  ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY amount DESC) AS rank_by_amount
FROM invoices;

ROW_NUMBER gives strictly sequential numbers (1, 2, 3...) with no ties. RANK gives the same number to tied values and skips subsequent numbers (1, 2, 2, 4). DENSE_RANK gives the same number to tied values but doesn't skip (1, 2, 2, 3). The choice depends on whether you want the next position to reflect the skipped ties.

The most useful pattern: "top N per group" via a CTE with ROW_NUMBER and a WHERE clause:

WITH ranked AS (
  SELECT
    account_id,
    invoice_id,
    amount,
    ROW_NUMBER() OVER (PARTITION BY account_id ORDER BY amount DESC) AS rn
  FROM invoices
)
SELECT account_id, invoice_id, amount
FROM ranked
WHERE rn <= 3;

This is the canonical "top 3 invoices per account" query. Without window functions, you would need a lateral join or a correlated subquery, both of which are slower and more verbose.

LAG and LEAD: looking at adjacent rows

LAG returns the value from a previous row in the partition; LEAD returns the value from a subsequent row. The most useful pattern is computing differences between consecutive rows:

SELECT
  date,
  daily_revenue,
  daily_revenue - LAG(daily_revenue) OVER (ORDER BY date) AS day_over_day_change,
  100.0 * (daily_revenue - LAG(daily_revenue) OVER (ORDER BY date))
    / LAG(daily_revenue) OVER (ORDER BY date) AS percent_change
FROM daily_summaries;

This is the pattern for trend analysis, session detection, and gap finding. With LAG, you can detect rows where the time gap to the previous row exceeds a threshold and use that as session boundaries. With LEAD, you can compute time-to-next-event for each row.

FIRST_VALUE, LAST_VALUE, NTH_VALUE

These return values from specific positions in the window. FIRST_VALUE with a partition gives you the per-group first row:

SELECT
  account_id,
  invoice_id,
  amount,
  FIRST_VALUE(amount) OVER (
    PARTITION BY account_id ORDER BY created_at
  ) AS first_invoice_amount
FROM invoices;

LAST_VALUE requires explicit frame specification to work as expected, because the default frame ends at the current row. To get the actual last value in the partition, specify the full frame:

LAST_VALUE(amount) OVER (
  PARTITION BY account_id
  ORDER BY created_at
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

This is the most common window function gotcha. The default frame is the source of more wrong answers than the rest of the syntax combined.

The performance story

Window functions are typically executed in a single pass over the sorted input, which means they are comparable in cost to an ORDER BY with the same key. PARTITION BY adds a logical grouping but doesn't require multiple passes. For large datasets, window functions are almost always faster than the subquery or self-join equivalent.

The cost surprise is the sort. A window function with ORDER BY requires the data to be sorted by that key. If the same query has an outer ORDER BY on a different key, the database has to sort twice. Indexes can sometimes provide the window's ordering for free, but only if the index covers the partition and order columns.

What window functions do not do

Window functions are computed after WHERE and GROUP BY but before ORDER BY and LIMIT. You cannot filter on a window function result in the WHERE clause; you need a CTE or subquery and then a WHERE on the outer query (the "ROW_NUMBER and then WHERE rn <= 3" pattern).

Window functions also cannot be used in joins or as join conditions. They are purely per-row calculations on the post-aggregation result set.

The deeper observation

Window functions are the part of SQL that turn analytics queries from multi-statement procedures into single declarative statements. The barrier to using them is not complexity — the syntax is regular — but unfamiliarity. Every engineer who works with databases should know the eight or nine patterns above. They make every subsequent SQL query you read and write smaller and clearer, and they are universally supported by modern databases.

Read more