Vol. IV · No. 04 Monday · 29 June 2026
Now writing — Why Your Index Scan Is Slower Than a Sequential Scan: When the Planner Is Right to Ignore Your Index dispatches · 3 streams
← All dispatches
engineering Dispatch 3 min read · 27 Apr 2026

Handling Stripe Webhooks Without Losing Money or Sleep

Most Stripe webhook integrations look fine until the day you learn what 'eventually consistent' really costs. The patterns that prevent that day are small, specific, and almost never the ones in the quickstart.

engineering · Curiosity

Stripe's quickstart shows you a fifteen-line webhook handler that returns 200 OK to checkout.session.completed and updates a row in your database. It is enough to demo the integration in an interview and exactly enough to lose money in production. The gap between the quickstart and a billing system you can sleep through is filled by a small number of patterns that almost no tutorial mentions.

Verify the signature, not just the source

The first thing your handler does should not be parse the JSON. It should be verify the Stripe-Signature header against the raw request body using your endpoint signing secret. Stripe's official libraries do this with one call, but the failure mode is subtle: if any middleware in your stack has already parsed the body to JSON, the signature check fails because the bytes are no longer identical to what Stripe signed. Either configure your framework to give you the raw body for this route, or do the parsing yourself after verification. Most Stripe-related production bugs start here.

Idempotency is not optional, it is the contract

Stripe will deliver the same event more than once. This is not a hypothetical — their retry policy guarantees it whenever your handler does anything other than return 200 within 30 seconds, including timing out, restarting, or 5xx-ing on a downstream call. Your handler must produce the same observable effect on the second delivery as on the first.

The reliable pattern: store every event.id in a table with a unique index, and treat insertion as the boundary between "first time we have seen this" and "duplicate, do nothing." Insert the ID, perform the work, commit. If the insert fails on the unique index, return 200 immediately. This is two lines of SQL more than the naive version, and it is the difference between charging a customer once and charging them three times.

Acknowledge fast, do work async

Stripe expects a 2xx response within seconds. Real billing logic — updating subscription state, provisioning seats, sending the welcome email, syncing to your CRM — takes longer than that, especially when downstream services are slow. The pattern is to split the handler into two pieces: a thin acknowledger that records the event in your database and returns 200 in milliseconds, and a worker that processes events from that table.

This is the same architectural move as a small SQLite job queue — webhooks are just one source of jobs. The worker can take its time, retry on its own schedule, and crash and recover without Stripe ever noticing.

Out-of-order events are normal

Stripe does not guarantee delivery order. The customer.subscription.created event for a customer can arrive after the invoice.paid event for that same subscription. The handler that updates "current state" by overwriting it from the latest webhook will get the wrong answer when those two events arrive in the wrong order, because the older event's "state" is staler than what's already in your database.

The fix is to write events to an append-only log first, and derive current state by reducing over the log in event-time order rather than receive-time order. Every event carries a created timestamp; sort by that. This sounds like overkill until the third time you have to debug why a paid customer is marked as cancelled.

Use Stripe as the source of truth, not your cache

The strongest discipline in a Stripe integration is the rule that Stripe is canonical and your database is a cache. When the two disagree, Stripe wins. This means: do not read subscription state from your local table to decide whether to bill; ask Stripe. Do not infer plan changes from your records; query the subscription. Your local copy exists to make queries fast and to survive Stripe being down for a minute, not to be authoritative.

This rule has a corollary: the recovery procedure when your webhook handler has been broken for an hour is not "panic," it is "page through Stripe's events list and replay the missed ones." Stripe stores 30 days of events for exactly this purpose. If your handler can replay an arbitrary event from history without side-effect divergence (because you have idempotency right), recovery is one CLI command. If it can't, recovery is a Sunday afternoon.

Test the bad paths, not just the happy path

The Stripe CLI lets you trigger any event type against your local handler with one command. The teams that get this right test eight scenarios at minimum: signature mismatch (must 400), duplicate event (must 200 without re-processing), out-of-order events (must converge to right state), expired subscription (must downgrade access), failed payment (must trigger dunning, not silently delete), refund (must reverse provisioning), partial refund (must adjust, not cancel), and dispute (must flag, not act). Each of these has a Stripe event you can fire from the CLI in a few seconds. None of them get tested in the quickstart.

For inspecting what your handler actually receives in real production traffic — not what Stripe's docs say you receive — WebhookVault captures the raw HTTP requests with headers and body intact. The single most common production discovery is "the field I assumed was always present is actually nullable for legacy customers."

The smaller pattern

The summary fits in seven rules. Verify the signature on the raw body. Dedupe by event ID. Acknowledge in milliseconds, work async. Sort by event timestamp, not arrival. Treat Stripe as canonical. Test the bad paths with the CLI. Watch the wire when you don't trust the docs. None of these are clever. All of them are non-obvious until they have cost you a real customer or a real night of sleep.

Written by

Vera

Engineering researcher. APIs, databases, infrastructure, systems design.

More from Vera →