Designing API Webhook Concurrency Limits: How Many Parallel Deliveries Is the Right Number

Webhook delivery is conventionally framed as a retry-and-backoff problem, and most provider documentation reflects that framing. The retry schedule, the maximum attempts, the dead-letter behavior all get prominent treatment. Concurrency policy is rarely discussed at all, even though it is what determines whether your delivery system can keep up under traffic spikes and whether customer receivers get hammered faster than they can process. The choice has implications for receiver-side reliability, your own operational cost, and the contract you implicitly offer customers about delivery latency.

What concurrency means in webhook delivery

Concurrency is the number of in-flight HTTP requests to a single receiver at any given moment. It is separate from throughput (events delivered per second) and ordering (whether earlier events arrive before later events). A delivery system with concurrency 1 sends one request at a time, waits for the response, then sends the next. A delivery system with concurrency 10 has up to ten requests in flight to the same receiver simultaneously.

The right concurrency depends on what the receiver can tolerate. Most webhook receivers are application servers behind load balancers, designed for human-driven request patterns. A sudden burst of webhook requests can saturate the receiver's connection pool, queue depth, or downstream dependencies (database, downstream APIs). The receiver's response to overload is usually 5xx errors or timeouts, which the sender interprets as failure and retries, compounding the problem.

The right concurrency also depends on how much the sender can tolerate. High concurrency consumes connection pool slots, outbound bandwidth, and CPU on the sender side. A misbehaving receiver that holds connections open without responding can exhaust the sender's resources if the concurrency limit is too high.

The single-receiver vs cross-receiver distinction

Per-receiver concurrency limits prevent a single receiver from being hammered. Cross-receiver concurrency limits prevent the sender from being overwhelmed by total in-flight requests across all receivers. Both are necessary; neither alone is sufficient.

Per-receiver limits are the customer-facing knob. The right default depends on the receiver type. Application servers typically handle 5-10 concurrent webhook deliveries fine. Serverless receivers (AWS Lambda, Cloudflare Workers) can handle more because each request runs in its own isolated environment. Legacy systems behind connection-pool bottlenecks may need a limit of 1 or 2.

The configuration surface should let the customer adjust the per-receiver limit, with a sensible default. Stripe defaults to 1 concurrent delivery per endpoint, which is conservative but safe. GitHub does not document its concurrency policy explicitly but behaves like a small constant (3-5 from observation). Higher defaults bias toward delivery speed; lower defaults bias toward receiver safety.

The ordering interaction

Concurrency interacts with ordering in subtle ways. If your delivery system promises any ordering guarantee (per-resource, per-account, per-subscription), concurrency must be 1 within that scope. A concurrency of 2 means two requests are in flight simultaneously, and there is no guarantee which arrives at the receiver first or which gets processed first.

The right answer for most B2B SaaS is to document that webhook ordering is not guaranteed, then set concurrency conservatively (1-3 per receiver) anyway because most customer receivers are not designed for high concurrent webhook load. The documentation honesty matters because customers who design integrations assuming ordering will write code that breaks subtly under retry-induced reordering even at concurrency 1.

If you do offer per-resource ordering as a product feature, the implementation requires per-resource queue partitioning with concurrency 1 within each partition. Total throughput scales with the number of partitions, which is usually large enough for most workloads. The cost is a more complex delivery system and the customer-facing promise of ordering, which constrains your future architecture decisions.

The backpressure mechanism

Receiver-side overload manifests as elevated 5xx rates, slow responses, or timeouts. The sender's response to these signals should be to reduce concurrency, not just to retry. Reducing concurrency on failure (additive-increase-multiplicative-decrease style) gives the receiver room to recover. Increasing concurrency on sustained success returns capacity once the receiver stabilizes.

The mechanism is similar to TCP congestion control. Start at a conservative concurrency, observe response times and error rates, increase carefully, decrease aggressively on failure. The implementation is per-receiver state in the delivery system, updated on every response. The benefit is graceful degradation under receiver overload rather than catastrophic retry storms.

Simpler implementations omit the dynamic adjustment and use a fixed conservative concurrency limit. This is fine for most workloads and avoids the complexity of congestion-control-style algorithms. The trade-off is leaving some delivery throughput on the table for healthy receivers in exchange for safety against unhealthy ones.

The dashboard surface

The configuration knob should be visible to customers. Most providers expose it as a per-endpoint setting with a sensible default and a small range of allowed values (typically 1-10). The dashboard should show the current concurrency setting alongside the delivery success rate and average response time, because the customer needs both to decide whether to adjust the limit.

The recommendation is to start conservative (concurrency 1-2) and increase only if the customer's receiver demonstrably handles higher concurrency without elevated error rates. The dashboard guidance should make this explicit. Some providers offer auto-tuning based on observed receiver behavior, which removes the customer-side decision but trades it for a black-box adjustment they cannot reason about.

The cross-receiver budget

Total in-flight requests across all receivers should be bounded too. The right total depends on the sender's connection pool size, outbound bandwidth, and CPU. A typical configuration is enough headroom for the busiest 10 percent of customers to operate at their per-receiver maximum without saturating the sender. The remaining capacity handles the long tail.

When the total budget is exhausted, new deliveries queue rather than start immediately. The queue depth is a key operational signal: a deep queue means the sender is undersized relative to the customer population. The remediation is either to scale the sender horizontally or to tighten per-receiver limits to make the total budget go further.

Three patterns that fail

First, unlimited per-receiver concurrency. The sender will eventually hammer a receiver that cannot handle it, the receiver will fail, and the sender will interpret the failure as needing retries, compounding the problem. The right default is conservative even at the cost of slower delivery.

Second, identical concurrency limit for all customers. Receiver capacity varies by orders of magnitude across customer types. A one-size-fits-all limit either over-constrains capable receivers or hammers fragile ones. The right design is a per-endpoint setting with a sensible default and customer-controlled adjustment.

Third, no monitoring of concurrency utilization. If you cannot see how close each receiver is running to its limit, you cannot diagnose why some receivers are slow or recommend limit adjustments. The minimum monitoring is per-receiver in-flight count, queue depth, and effective throughput.

Our use across the four products

DocuMint, CronPing, FlagBit, and WebhookVault all use a shared webhook delivery library with per-endpoint concurrency limits configurable from 1 to 10, defaulting to 3. The shared infrastructure benefits the smaller products (DocuMint and FlagBit have low webhook volume) because the operational mechanics are battle-tested by the higher-volume products (CronPing monitor state changes and WebhookVault delivery retries).

WebhookVault is the most aggressive on monitoring because the delivery system is the product: customers pay specifically for reliable webhook capture and replay, and the concurrency policy directly affects the customer experience. The dashboard shows per-endpoint in-flight count and queue depth alongside the conventional success-rate and response-time metrics. Customers can adjust the concurrency limit themselves, and the recommendation text reflects observed receiver behavior over the past week.

CronPing applies concurrency limits more conservatively because monitor state changes are relatively rare per customer (most monitors are healthy most of the time, with state changes only on transitions). The default concurrency of 3 is rarely binding. The interesting case is the bulk-notification scenario when many monitors transition simultaneously (a region-wide outage affecting hundreds of monitored services), where the concurrency limit prevents the sender from overwhelming customer receivers during the incident.

FlagBit and DocuMint use the same library but exercise it less because webhook volume is lower. The shared infrastructure investment compounds across products, which is one of the reasons we prefer to share infrastructure across products rather than reimplementing per-product.

The deeper observation

Concurrency policy is one of the smaller but consequential decisions in webhook system design, mostly invisible to customers until it bites. Most providers underdocument it, which means customers form expectations based on the retry policy alone and are surprised when the receiver-side experience does not match. The honest framing is that concurrency, retry, and backpressure are three separate dimensions of webhook delivery behavior, and all three need explicit documentation and configurable defaults. Treating webhook delivery as a system rather than as a single send-and-retry operation produces a better customer experience and a more predictable operational profile, which is mostly what customers want even when they cannot articulate it.

Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.