API Webhook Replay vs Resend: Two Patterns That Serve Different Customer Needs
Customers ask for webhook replay and webhook resend as if they were the same feature. They are not. Building both with the same primitive produces confusing semantics. Building them with the right distinct primitives is two days of work that saves months of support tickets.
Every webhook-emitting API eventually gets the same customer request: can you replay the events from yesterday? What the customer actually wants splits into two distinct features that share almost no implementation, and conflating them is one of the most common API design mistakes in this space. We have built and rebuilt this pattern across DocuMint, CronPing, FlagBit, and WebhookVault, and the distinction below is what we wish we had drawn before the first round of customer support requests.
What the two requests actually mean
The first request is replay: I want you to send the events that you previously delivered (or attempted to deliver) to my endpoint, exactly as they were originally sent, so I can re-process them through my system. The use case is almost always that my endpoint was broken or my processing logic had a bug, the original event arrived and was acknowledged or failed, and I want to reprocess from a known starting point.
The second request is resend: I want you to fetch the current state of an entity and send me a synthetic event reflecting that state, so my system can sync with yours. The use case is almost always that I am building a new integration or recovering from a longer outage, the original events are no longer available or are stale, and I want a snapshot of the world right now.
These look similar from the API surface but require fundamentally different infrastructure. Replay requires that you stored the original event payloads. Resend requires that you can construct an event payload from current entity state. A customer who asks for replay and gets resend will get a payload that does not match their event log; a customer who asks for resend and gets replay will get stale data that does not reflect changes that happened during the outage.
The replay implementation
Replay is built on the event store. The events table needs to retain the full original payload, the original delivery attempts, and enough metadata to recompose the delivery exactly: the original event ID, the original timestamp, the original signature, the headers that were sent. The replay endpoint accepts a list of event IDs (or a query like all events between T1 and T2) and triggers fresh delivery attempts using the original payloads.
The minimum schema is the standard webhook events table plus a delivery attempts table:
CREATE TABLE webhook_events (
id BIGSERIAL PRIMARY KEY,
account_id BIGINT NOT NULL,
event_id TEXT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE TABLE webhook_deliveries (
id BIGSERIAL PRIMARY KEY,
webhook_event_id BIGINT NOT NULL REFERENCES webhook_events(id),
subscription_id BIGINT NOT NULL,
attempt INT NOT NULL,
status_code INT,
response_body TEXT,
attempted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
is_replay BOOLEAN NOT NULL DEFAULT FALSE
);The is_replay flag is important: customers' systems that depend on event ordering or idempotency need to be able to distinguish a replayed delivery from a fresh one. The standard practice is also to include a header like X-Webhook-Replay: true on replayed deliveries so the receiver can branch on it.
Replay is naturally idempotent on the consumer side because the event ID is unchanged. A consumer that keys side effects on event ID will see the replay as a duplicate of the original and skip it. This is exactly the right behavior for the canonical use case of my consumer was broken and I want to reprocess: the consumer's deduplication is exactly what protects against double-application.
The resend implementation
Resend is built on the current state of entities. The resend endpoint accepts an entity ID (or a query like all entities of type X) and constructs synthetic events from the current state. The event payloads are fresh: a new event ID, the current timestamp, the current entity state.
This is structurally a different feature: the event store is not consulted, the original delivery records are not relevant, and the payload is generated by the same code that generates fresh events for state changes. The minimum API surface is a single endpoint per entity type:
POST /api/v1/webhooks/resend
{
"entity_type": "invoice",
"entity_ids": [123, 456, 789],
"subscription_id": 42
}Returns a job ID; the resend happens asynchronously and the synthetic events appear at the customer's endpoint with a X-Webhook-Resend: true header and an event type like invoice.snapshot that is distinct from the entity's normal lifecycle events.
The synthetic event type matters: customers must be able to recognize a snapshot event as different from an entity-creation or entity-update event. A customer who treats an invoice.snapshot as an invoice.created will create duplicate downstream entities. The naming convention should make this distinction unmissable.
Per-subscription scoping
Both endpoints need account-scoped authorization that resolves to specific subscriptions, not just to event types. A customer with multiple webhook subscriptions to the same event type (development, staging, production endpoints) almost always wants to replay or resend to a specific subscription, not all of them. The subscription ID belongs in the request body, not implicit.
Bulk operations need rate limiting, ideally as a per-account queue that processes resends and replays at a rate the customer's endpoint can absorb. The default rate should be much lower than normal event delivery, because bulk operations are typically used for catch-up after an outage and the customer's endpoint is often still recovering. Ten events per second per subscription is a reasonable starting point.
The dashboard surface
Both operations need dashboard exposure because the customer support angle is too high to leave them as API-only features. The dashboard should show:
- Per-event delivery history with a replay button on failed deliveries
- Per-entity resend button on entity detail pages
- Bulk replay dialog with filter by event type, time range, status
- Bulk resend dialog with filter by entity type, attribute
- Job status page for asynchronous bulk operations
The dashboard buttons should generate the same API calls a programmatic integration would make. This guarantees feature parity between dashboard and API and means the dashboard becomes a debugging tool for customers who want to understand what their API calls would do.
What customers actually need
From years of support tickets across the four products, the breakdown of customer-need-vs-customer-language is roughly:
- Asks for "replay," needs replay: 40%. The customer's consumer had a bug, they fixed it, they want the originals reprocessed.
- Asks for "replay," needs resend: 40%. The customer's system is out of sync (often building a new integration or recovering from a long outage) and they want the current state.
- Asks for "resend," needs replay: 5%. Rare, usually a system where the customer thinks of every event as a state snapshot.
- Asks for "resend," needs resend: 15%. The customer correctly recognizes they want current state, not historical events.
The takeaway is that even if 80% of customers say "replay," half of them actually want resend. The dashboard distinction needs to be drawn loudly so customers self-select correctly without a support ticket.
The deeper observation
Customer language for webhook recovery flattens into a single word that the underlying problem space splits into two distinct features. Building both with the same primitive (the most common shortcut) means one of the two features always behaves wrong because the primitive cannot simultaneously serve both semantics: it cannot return original payloads (for replay) and current state (for resend) from the same code path. The discipline of recognizing the customer-language collapse and building distinct primitives behind distinct dashboard surfaces is two days of engineering work that saves an indefinite stream of confusing customer support tickets where the customer and the engineer are using the same word to mean different things.