Designing Webhook Delivery Order Guarantees: What Most APIs Promise and What They Actually Deliver

Webhook ordering is one of the topics where the gap between API documentation and reality is widest. Most providers say nothing explicit. Most consumers assume more than the provider guarantees. The result is a class of bugs that is rare enough to escape testing and persistent enough to never go ...

The webhook ordering question is one of those that almost never comes up during the integration walkthrough and almost always comes up during the first incident review. The customer's invoice was marked paid before it was created. The user's account showed deleted while a charge was processing. The feature flag was applied before the rollout was provisioned. In every case, the events were sent in the right order; the receiver processed them in the wrong order. The honest answer is that webhook delivery order is harder to guarantee than the integration docs typically suggest, and most teams should design for unordered delivery from the start.

What ordering would even mean

The first complication is that there are multiple things "ordering" might mean. Per-resource ordering is the strictest: all events for a given customer or invoice or flag arrive in the order they were produced. Per-account ordering is weaker: events for the same customer account arrive in account-internal order, but events from different accounts can interleave freely. Global ordering is strictest of all: all events from the provider arrive in a single total order. Most consumers want per-resource ordering and assume they get it; most providers offer per-account ordering at best, and many do not even guarantee that.

The reason providers cannot offer global ordering is that the events are produced by many different parts of the system in parallel, and serializing them all through a single ordering step would create a bottleneck. The reason providers struggle even with per-resource ordering is that retries and parallelism in the delivery pipeline can reorder events that were emitted in order at the source.

Why retries break ordering

The canonical reordering scenario is: event A is emitted, delivery fails, event A enters retry queue with a 30-second backoff. Event B is emitted (after A in source order), delivery succeeds. Event A retries and succeeds 30 seconds after B arrives. The receiver sees B before A despite A being emitted first.

This is not a bug in the provider's delivery pipeline; it is the predictable consequence of any retry strategy that does not stall all subsequent events when an earlier one fails. Stalling subsequent events for a single failed event is a denial-of-service vulnerability (a single misbehaving event can block the entire pipeline) and is rarely chosen by webhook providers for that reason.

The receiver-side consequence is that any webhook integration that retries can produce out-of-order delivery. The receiver cannot tell from the message itself whether it was delivered in order; the receiver has to design for reordering at the application layer.

Why parallelism breaks ordering

The second source of reordering is parallel delivery. Most webhook providers have multiple worker processes or threads consuming the same delivery queue. Even if events are placed in the queue in order, the workers can pull them off in parallel and complete deliveries in different orders depending on network latency, receiver response time, and worker scheduling.

The mitigation is to partition the delivery pipeline by resource ID so that all events for a given resource go through the same worker. This works in principle and is what providers offering per-resource ordering implement under the hood. The cost is that a slow consumer on one resource creates head-of-line blocking for all other events for that resource. The benefit is honest per-resource ordering. Whether the trade-off is worth it depends on the volume and the consequence of reordering for the specific event types.

What the textbook says

Stripe's documentation states "Stripe does not guarantee delivery of events in the order in which they are generated." GitHub's webhook docs say roughly the same. Linear says explicitly that webhooks "may arrive out of order." Most other providers say nothing, which in practice means the same thing. The honest reading is that webhook delivery is unordered until proven otherwise.

The consequence for receivers is that the receiver cannot rely on order without taking specific defensive measures. The measures fall into three categories, and most production-quality webhook integrations use all three.

Defensive measure one: idempotency by event ID

The first defense is idempotency. Every webhook event should have a stable identifier (the provider's event_id), and the receiver should record which event IDs it has processed and skip duplicates. This does not solve the ordering problem directly, but it makes retries safe so the receiver does not double-apply an event that arrives twice.

The minimal idempotency table has two columns (event_id, processed_at) with event_id as primary key. The receiver inserts on entry to the handler; if the insert fails due to the unique constraint, the event was already processed and the handler returns immediately. This handles the duplicate-delivery case completely.

Defensive measure two: sender-supplied timestamps

The second defense is to use the event's source timestamp rather than the receive timestamp for any business logic that depends on time order. If event A has a source timestamp of 14:00:00 and event B has a source timestamp of 14:00:30, the receiver can determine the correct order even if they arrive in the reverse order.

The complication is that source timestamps are only useful for ordering events that affect the same resource. Two events on different resources from the same provider may have skewed timestamps (because they were generated in parallel by different parts of the system), and using source timestamps to globally order events from different resources will produce inconsistent results.

The practical pattern is to use source timestamps for per-resource ordering and to record both the source timestamp and the receive timestamp on every event so that out-of-order arrivals can be detected and audited. If a charge.succeeded arrives before the customer.created event for the same customer, the receiver can log the discrepancy and either wait for the customer.created or fetch the customer record directly from the provider's API.

Defensive measure three: model events as state transitions

The third defense is the most powerful and also the most invasive. Instead of treating webhook events as commands to apply ("create this invoice"), treat them as state transitions to evaluate ("the invoice is now in this state"). The receiver maintains a model of the current state of each resource and ignores events that would represent a backward transition.

For example, if the receiver has already recorded that an invoice is in state "paid" and a webhook arrives indicating the invoice is in state "pending," the receiver ignores the event because pending is an earlier state than paid. The receiver does not need to wait for events to arrive in order; the receiver just needs to recognize when an arriving event is older than the current known state and discard it.

This approach requires the receiver to model the state machine of each resource type, which is more work than handling each event independently. The benefit is that the receiver is robust to out-of-order delivery, duplicate delivery, and missing events (because the receiver can fetch the current state from the provider's API if it suspects it has missed something). The cost is that the receiver and the provider have to agree on what the state machine looks like, which is a contract that is not always explicit in the provider's documentation.

What does not work

Sequence numbers per resource sound like they should work and mostly do not. The problem is that the provider has to maintain a per-resource counter, increment it atomically on every event, and survive crashes without losing or duplicating numbers. This is achievable but expensive, and most providers do not offer it. Without sequence numbers, the receiver cannot tell whether a gap in sequence indicates a missed event or just out-of-order arrival.

Time-window buffering ("wait 30 seconds for events to settle before processing") sounds reasonable and has the obvious failure mode of delaying every event by 30 seconds and still being wrong if the delivery delay exceeds the buffering window. Buffering moves the problem from "events may arrive out of order" to "events may have arrived but you decided not to process them yet," which is rarely an improvement.

Strict ordering enforcement at the receiver (rejecting any event that arrives out of order and waiting for the missing one) creates head-of-line blocking and turns a delivery delay into a complete stall.

What this looks like across our four products

Of our four products, WebhookVault is the one most directly involved in webhook ordering, because it captures, inspects, and replays webhooks for debugging. The capture endpoint records every webhook request with a source-time header (if the sender provides one), a receive-time header (always recorded), and a sequence-within-endpoint counter, so customers can debug their own ordering issues by reviewing the captured events. The replay feature preserves the original timestamps so the receiver sees the same Date headers as the original delivery.

DocuMint emits Stripe webhooks for payment events (subscription.created, subscription.updated, invoice.paid) and follows the Stripe convention of unordered delivery; our internal receiver maintains a state model per subscription and processes events as state transitions. CronPing and FlagBit emit webhooks for resource state changes (monitor.failed, flag.updated) and document that delivery is unordered; the documentation includes a worked example of the state-machine pattern for receivers.

The deeper observation is that webhook ordering is a contract that the provider rarely specifies explicitly and the consumer rarely reads carefully. The result is a class of bugs that lives in production for years because they only manifest under network conditions or retry behavior that the integration test suite does not reproduce. The teams that avoid these bugs are the ones that read the providers' documentation carefully, design the receiver to be order-independent by default, and treat any assumption of order as a defect to be removed.

Read more