Webhook Receivers That Don't Lose Events: Patterns for the Consumer Side
Most webhook documentation focuses on the sender. The consumer side has its own design space — ack-fast queueing, idempotency by event ID, signature verification on raw bytes, and a small set of patterns that distinguish receivers that quietly drop events from receivers that surv...
Most webhook content on the internet focuses on the sender side: how to design payload schemas, how to retry, how to sign requests, how to version the contract. This is appropriate — webhook senders are usually SaaS products with engineering teams whose explicit job is to think about delivery semantics, and the design choices they make echo across thousands of integrations.
The consumer side gets much less attention and is where most of the actual production bugs live. The webhook receiver in your application is typically a small handler bolted onto an HTTP server that someone wrote in a hurry to integrate Stripe in 2022 and has not been touched since. It works fine until it doesn't, and "doesn't" usually means an event was processed twice, or dropped silently, or processed three weeks late after a queue backlog drained, or processed correctly but with the wrong customer because of a payload schema change the sender announced two months ago.
This post is the small set of patterns that distinguish receivers that survive real production from receivers that quietly accumulate correctness bugs that nobody notices for months. Each of our four products both sends and receives webhooks, so we have specific scars from both sides.
Ack-fast queue-and-process
The single most important pattern is to acknowledge the webhook quickly and process it asynchronously. The receiver's HTTP handler does three things: verifies the signature, persists the raw event to a queue or database, and returns 200. Everything else — business logic, downstream API calls, database writes, customer notifications — happens in a separate worker that consumes from the queue.
The reason this matters is that the sender's retry policy is your problem. If your handler does ten seconds of synchronous processing and the sender times out at five seconds, the sender will retry — but you have already started processing the first attempt. You now have two concurrent processings of the same event, with all the correctness bugs that implies. The ack-fast pattern collapses this: the sender sees a fast 200 every time, the work happens in the background where slowness does not cause double-processing.
The queue does not have to be Kafka. It does not have to be Redis. It does not even have to be a queue. A database table with an "unprocessed" status flag, polled by a worker process, is operationally simpler and handles every case under a million events per day. The pattern matters more than the implementation.
Signature verification on raw bytes, before anything else
Webhook signature verification has one rule that almost everyone gets wrong eventually: the signature is computed over the raw request body bytes, not the parsed JSON. If your framework parses the body before your handler sees it, then re-serializes when you ask for the bytes, the re-serialization will not match the original — different key ordering, different whitespace, different number formatting — and verification will fail intermittently in ways that depend on the sender's serialization choices.
The fix is to capture the raw body bytes early in the request lifecycle, before any parser sees them, and verify the signature against those bytes. Frameworks make this harder than it should be — Express needs raw-body middleware, FastAPI needs request.body() before any model parsing, Rails needs the raw POST body access pattern. The recipe is framework-specific and almost always documented in the sender's integration guide; if you copy from a generic webhook tutorial without checking the sender's docs, you will get this wrong.
Verification has to happen before signature-protected logic runs. Specifically, before you persist the event to your queue, before you log the payload, before any side effect. The reason is that an attacker who can send arbitrary unsigned events to your endpoint should not be able to fill your queue with garbage that your worker then tries to process.
Timestamp-based replay protection
Signature verification proves the payload was generated by someone with the signing secret. It does not prove the payload was generated recently. An attacker who captures a valid signed webhook (from network interception, log scraping, or a compromised intermediary) can replay it indefinitely and your verification will pass every time.
The fix is to include a timestamp in the signed payload and reject events outside a tolerance window — five minutes is the conventional value. Stripe does this with the t= field in the signature header. GitHub does this with X-Hub-Timestamp. Custom signers should follow the pattern. The tolerance window is a tradeoff between protection against replay (smaller is better) and tolerance for clock skew between sender and receiver (larger is better); five minutes is the value most senders converge on.
Idempotency by stable event ID
Even with ack-fast and signature verification, you will receive duplicate events. The sender's retry policy is the most common source — the sender did not get your 200 (because the network dropped it, because the LB timed out, because you redeployed) and so they retried, and you now have two events that are byte-identical but represent the same business event.
The deduplication primitive is the event ID — a stable identifier supplied by the sender in the payload. Stripe events have an id field. GitHub events have a X-GitHub-Delivery header. Linear events have data.id. Whatever the sender provides, store it in your database with a unique constraint, and let the constraint violation be your dedup signal.
The implementation pattern is INSERT-or-skip on the event ID into a "processed_events" table, atomically with whatever side effect the event triggers. If the INSERT succeeds, do the side effect; if it fails because of the unique constraint, you have already processed this event and can return 200 without doing anything. The atomicity matters — you do not want a crash between the INSERT and the side effect to leave the system in an inconsistent state.
Out-of-order delivery is the default
Webhook senders generally do not promise ordered delivery. Even when they try to maintain order, retries break it: event A arrives first, fails, gets queued for retry; event B arrives second, succeeds; event A's retry succeeds five minutes later, and your handler now sees A after B even though A was generated first.
The receiver-side fix is to design business logic that does not depend on receipt order. The two patterns that work are: include a sender-supplied timestamp on every event and use it for ordering decisions in your business logic, or treat the webhook as an invalidation signal and re-fetch authoritative state from the sender's API in the handler. Both patterns are more work than naive in-order processing and both are correct in the face of real-world delivery semantics.
Schema evolution without breaking
The sender will eventually add fields to their webhook payloads. They will eventually deprecate fields. They will eventually rename fields, despite all the advice not to, because someone in their product organization will not be told no. Your receiver needs to handle this without breaking.
The discipline is: parse permissively, validate the fields you need, ignore fields you do not, never assume a field will be present unless the sender's docs guarantee it, and pin to the version of the schema your code was tested against if the sender supports versioning. Stripe supports API versions on webhook subscriptions; if you do not pin, you get whatever the latest version sends and your code breaks on the next breaking change. Pin explicitly, schedule version upgrades as deliberate work, and treat unpinned webhooks as a bug in your integration.
Observability that catches the silent failures
Webhook receivers fail silently more often than any other class of integration. The handler returns 200 but the worker crashes. The signature verification works but the parser produces garbage on a new payload variant. The dedup logic deduplicates events that were actually distinct because the sender reused an ID across event types.
The observability that catches these is: count of received events per type, count of processed events per type with status (success/error/duplicate), age of the oldest unprocessed event in the queue, count of signature verification failures per source. The age-of-oldest-unprocessed metric is the highest-leverage one — if it is non-zero and growing, your worker is broken or your queue is wedged, and you find out in five minutes instead of five days.
Why this matters across our four products
Each of our products receives webhooks from somewhere. DocuMint receives Stripe events for subscription lifecycle. CronPing receives Stripe events plus is itself a webhook sender — when a monitor misses its schedule, CronPing fires a webhook to a customer-supplied URL. FlagBit receives Stripe events. WebhookVault is the receiver as a service — its entire product is being a really good webhook receiver that captures, inspects, and replays whatever the sender supplies. The patterns above are the ones that survived contact with real production traffic across all four products. The pattern that does not work is to copy a tutorial from 2018 that ignores idempotency and signature verification on raw bytes; that webhook handler will look fine in development and accumulate silent correctness bugs in production until a customer reports something inexplicable months later.