Designing API Webhook Delivery Batching: When to Coalesce Events Into Single Payloads

Single-event-per-delivery is the right default for webhooks and most B2B SaaS providers ship nothing else. There are specific scenarios where batching produces large operational wins, and the boundaries between when batching helps and when it hurts are worth being explicit about.

The standard webhook model is one event per HTTP delivery. A customer creates a resource, the provider emits one webhook. A customer pays an invoice, the provider emits one webhook. A flag rule changes, the provider emits one webhook. The model is simple, the customer-side processing is simple, and almost every major B2B SaaS webhook product ships nothing else. Stripe, GitHub, Linear, Shopify, Twilio all default to single-event delivery and the customer-side processing assumption is one event per HTTP request.

There are specific scenarios where the single-event default produces large operational costs on both sides. High-frequency events like CronPing monitor pings or FlagBit evaluation logs can produce thousands of webhooks per minute under load. Each webhook is a separate TCP connection (or HTTP/2 stream), a separate TLS handshake amortized over the connection, separate signature verification, separate parsing, separate database row, separate ack response. The per-event overhead is small but multiplies. At thousands per minute the overhead exceeds the actual event-processing time on both sides.

What batching actually does

Batched webhook delivery coalesces multiple events into a single HTTP request with a JSON array body. The customer-side handler iterates the array. The provider-side delivery worker dequeues a batch from the per-subscription queue, packages them into one POST, signs the whole thing, sends it, and acks all events on a single 2xx response. The wire-level efficiency win is substantial: one connection, one TLS handshake, one signature computation, one database row for the batch instead of N for each event.

The customer-side simplicity loss is real. Single-event handlers can be stateless processors that take one event and apply one update. Batch handlers must iterate, may need to deduplicate within the batch, must handle the partial-failure case where some events processed cleanly and others raised exceptions, and need to decide whether to ack the batch as a whole or implement per-event acknowledgment. The contract is more complex to document, more complex to test, and more complex to debug.

When batching pays back

Batching earns its complexity in four specific scenarios. The first is high-volume telemetry events where the per-event overhead dominates actual processing time and the customer is aggregating across events anyway. CronPing successful-ping events fit this — most customers aggregate pings into uptime metrics rather than processing each ping individually. The second is fan-out scenarios where a single business event produces many related webhook events that the customer will process together. FlagBit bulk flag updates fit this — changing 50 flags via a bulk API produces 50 evaluation-cache-invalidation events that the customer wants to apply atomically.

The third scenario is recovery-from-outage where a customer's receiver was down and the provider needs to deliver thousands of queued events. Single-event delivery of a 10000-event backlog takes 30 minutes at 5 events per second. Batched delivery of the same backlog in batches of 50 takes 30 seconds. The fourth scenario is high-cost-per-delivery integrations where the customer pays per webhook call to a downstream system and batching directly reduces customer cost.

When batching hurts

Batching hurts when the events are independent and high-value. A customer paying for a course wants the enrollment webhook delivered as soon as possible, not coalesced into a batch with nine other enrollments and delayed by the batching window. The latency cost of batching is real — most batching schemes wait some milliseconds for additional events before sending, and that wait is a latency floor on every event.

Batching also hurts when the events have heterogeneous types and the customer dispatches on type. Single-event delivery with the event type as a header field allows the customer's reverse proxy to route directly to the right handler. Batched delivery with mixed types forces the handler to iterate, dispatch per-event, and recombine — all the complexity of single-event delivery with no operational win.

The minimum viable batching design

The minimum design that works has a few pieces. The customer opts into batching per subscription with explicit batch_size and batch_window parameters. The batch_size has a hard upper bound (most providers ship 100-1000 events per batch). The batch_window has a hard upper bound (most providers ship 5-60 seconds maximum wait). The provider sends a batch when either threshold is hit — batch_size events accumulated or batch_window expired.

The HTTP body is a JSON object with a top-level events array. Each element has the same shape as a single-event payload would have plus an explicit event_id, event_type, and occurred_at. The signature is computed over the whole body. The customer acks the batch with a single 2xx response. A 4xx response means the batch is bad and should not be retried (signature failure, schema violation). A 5xx response means the batch failed and should be retried — the whole batch, not individual events.

The retry semantics are simpler than they sound. The customer side enforces idempotency by event_id whether the events arrive batched or individually, so a retry of a batch where some events were already processed is safe. The provider side treats the batch as an atomic unit: ack the batch and the events are delivered, fail the batch and all events go back to the retry queue.

What we built across the four products

Across DocuMint, CronPing, FlagBit, and WebhookVault, three of the four ship batching as an opt-in per-subscription feature.

CronPing has the strongest batching case because high-frequency monitor pings dominate the workload. The default is single-event delivery for monitor-state-change events (up, down, recovered) and batched delivery available for ping-received events with batch_size up to 100 and batch_window up to 10 seconds. Customers opting into batched ping events get an order of magnitude lower webhook bandwidth.

FlagBit ships batching for evaluation-log webhooks with the same opt-in shape. Most customers want individual flag-change events delivered immediately because flag changes are usually low-volume and high-importance. The evaluation-log stream is high-volume and aggregation-friendly, so batching matches the workload.

WebhookVault ships batching for captured-request events with batch_size up to 50 and batch_window up to 5 seconds. The use case is debugging dashboards that ingest webhook captures for visualization — the customer dashboard iterates the batch and updates the visualization once per batch instead of once per event.

DocuMint does not currently ship batching because the invoice-generation completion event is the primary webhook and customers want it delivered immediately. The volume per customer is low enough that single-event delivery has no operational cost worth optimizing.

What batching does not solve

Batching reduces per-event overhead but does not change the fundamental delivery contract. Customers still need idempotency by event ID because retries still happen. Customers still need ordering tolerance because cross-batch ordering is not stronger than single-event ordering. Customers still need failure-mode handling because batches can fail just like individual deliveries.

The decision tree is short: high-volume events where customers aggregate across events benefit from batching, everything else is better served by single-event delivery. Most webhook product surface should be single-event because most events are low-volume and high-importance. The narrow set of high-volume aggregation-friendly event types is where batching earns the cost of building and documenting the more complex contract.

The structural argument we keep returning to: API surface is mostly a question of which complexity to expose to customers. Single-event delivery hides the per-event overhead inside the provider. Batched delivery exposes the batching complexity to the customer in exchange for lower overhead. The right default depends on whether the customer would rather pay the overhead or pay the complexity. For high-volume aggregation-friendly events the complexity is cheaper. For everything else the overhead is cheaper. Both options should exist for the events where the trade-off is real, and only one option should exist for the events where the trade-off is not.


Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.