API Pagination Beyond Lists: Patterns for Trees, Graphs, and Hierarchies
Standard cursor pagination assumes a linear ordered list. Tree, graph, and hierarchical resources break that assumption, and the patterns that handle them well are not obvious extensions of the flat-list case.
Almost every pagination tutorial starts with a flat list ordered by timestamp or ID. That covers the majority of REST endpoints in the wild. It does not cover the resources that gave their authors the most trouble: trees of comments, graphs of related entities, hierarchies of organizational units, and threaded conversations. The standard cursor pattern does not extend cleanly to these cases, and the patterns that do extend are less widely documented.
We hit these cases across DocuMint (invoice line-items nested under invoices), CronPing (monitor groups with hierarchical permissions), FlagBit (targeting rule trees), and WebhookVault (webhook deliveries grouped by endpoint then by event). Each forced a non-obvious pagination decision.
The tree case
A comment thread or organizational hierarchy is a tree. The natural way to display it is depth-first, with indentation showing parent-child relationships. The natural way to paginate it is to page by top-level node and return the full subtree for each top-level node returned. This means a single page response can contain a variable number of total rows, depending on the depth and branching of the subtrees on that page.
The implementation is a recursive CTE that fetches the top-level nodes for the page, then expands each into its subtree:
WITH RECURSIVE roots AS (
SELECT * FROM comments
WHERE parent_id IS NULL AND post_id = $1
ORDER BY created_at DESC, id DESC
LIMIT $2 OFFSET $3
),
subtrees AS (
SELECT c.*, 0 AS depth FROM roots c
UNION ALL
SELECT c.*, s.depth + 1 FROM comments c
JOIN subtrees s ON c.parent_id = s.id
)
SELECT * FROM subtrees ORDER BY thread_path;The cursor is on the top-level nodes only. The response shape gains a depth field per row, and the client renders the tree by walking the flat array in order. Total result size per page is bounded by limiting the maximum subtree size and offering a "load more replies" subroutine for trees that exceed the limit.
The wrong approach is to flatten the tree into a single list ordered by some scalar (creation time, path) and paginate that. The cursor stability problem is severe: a new reply added deep in the tree changes the order of every subsequent comment in the flat list, breaking cursor semantics in a way users notice as "comments disappearing during pagination."
The graph case
A graph of related entities (users following users, products that go together) is harder than a tree because there is no natural root or ordering. Pagination requires a starting node and a traversal strategy. The two patterns that work are bounded-radius traversal and breadth-first frontier paging.
Bounded-radius traversal returns all nodes within K hops of the starting node. It is the right pattern when K is small and the local neighborhood is bounded (most social-graph operations fit this). The implementation is a recursive CTE with depth limit:
WITH RECURSIVE neighborhood AS (
SELECT id, 0 AS depth FROM users WHERE id = $1
UNION
SELECT f.follower_id, n.depth + 1
FROM follows f
JOIN neighborhood n ON f.followee_id = n.id
WHERE n.depth < $2
)
SELECT * FROM neighborhood;Breadth-first frontier paging is the right pattern when K is large or unbounded. The client maintains a frontier set of nodes-to-visit, and each page request returns the next batch of nodes plus the new frontier. The server is essentially stateless: the cursor encodes the frontier. This is how Twitter's followers API and similar large-graph traversals work, though the actual cursor encoding is more complex than the basic recursive-CTE pattern handles.
The hierarchical case
A hierarchy of organizations or categories is structurally a tree, but with two pagination dimensions: pages within a level and drill-down between levels. The right pattern is one paginated endpoint per level, with parent IDs in the request and child collections paginated independently:
GET /organizations?cursor=...
GET /organizations/{id}/teams?cursor=...
GET /organizations/{id}/teams/{id}/members?cursor=...The temptation is to provide a single endpoint that returns the whole hierarchy in one call. This works at small scale and fails badly at larger scale: one customer with a deeply-nested hierarchy can produce responses much larger than the API gateway can handle, often without any indication in the client code that the call would be expensive. Per-level endpoints force the client to ask for what it can render, with the side benefit of natural cache boundaries.
The grouped case
Resources that are naturally grouped (deliveries grouped by endpoint, events grouped by type) are the most common case after flat lists. The right pattern depends on whether the group is the primary access unit or the secondary one. If users browse by group most of the time, paginate the groups and return all items per group up to a per-group cap. If users browse the flat stream most of the time, paginate the flat stream and let the client cluster client-side.
WebhookVault's first implementation tried to do both: a single endpoint that returned grouped deliveries with pagination of both groups and items within groups. The complexity was substantial: cursor encoding needed to capture position-in-group plus position-in-list-of-groups, and edge cases around groups with new items added during pagination produced visible bugs. The refactor was to two separate endpoints with single-dimension pagination each, which removed the cursor-complexity entirely.
The cursor stability problem
All non-trivial pagination shapes have a cursor stability problem: what happens when the underlying data changes during pagination? For flat lists with append-only data, the answer is that new items added after pagination started simply do not appear in the current scan, which is usually acceptable. For trees and graphs, the answer is much harder, because additions can change the structure that pagination is walking.
The defensive pattern is snapshot pagination: the cursor captures a server-side timestamp or version, and subsequent pages are served from data as it was at that snapshot. This requires the underlying data to support point-in-time queries, which is straightforward in Postgres with the right access patterns but expensive to retrofit. The aggressive pattern is to accept inconsistency and document it: pagination over actively-changing data is best-effort.
For our four products, we documented that pagination over actively-changing data is best-effort, with the snapshot pattern reserved for export endpoints that customers depend on for accuracy. The trade-off is reasonable for most B2B SaaS use cases, where the user is paginating their own data and the staleness window is small.
What works in practice
The decision tree we use across our four products is:
- Flat list with stable ordering: cursor pagination with limit+1 has_more trick.
- Tree where users browse top-level items: paginate top-level items, return full subtree per item with a depth cap.
- Graph with bounded local neighborhood: K-hop traversal with K as a query parameter.
- Graph with unbounded traversal: breadth-first frontier with cursor encoding the frontier.
- Hierarchy: one paginated endpoint per level.
- Grouped resources: two single-dimension endpoints, not one two-dimension endpoint.
The deeper observation is that pagination is one of those API design problems where the standard answer covers 80 percent of cases and the remaining 20 percent contains most of the operational surprises. Investing in the right pagination shape early is much cheaper than retrofitting it once customers have written integrations against the wrong shape.