Designing API Webhook Signing Secret Rotation: How to Change Keys Without Breaking Customers

Most webhook docs describe the signing mechanism. Far fewer describe how to rotate the signing secret without breaking every customer integration. The patterns that work are simple but underused.

Most webhook documentation describes the signing mechanism in detail. Far fewer documents describe how to rotate the signing secret without breaking every customer integration. The patterns that work are simple but underused, and the absence of a documented rotation story is one of the reliable signs that a webhook integration has not yet had a security incident.

Why rotation matters

The signing secret is the shared secret between sender and receiver that proves a webhook payload came from the sender and was not modified in transit. The standard implementation is HMAC-SHA256 over the request body, with the secret as the key, and the signature included as a header. The receiver computes the same HMAC over the body and compares against the header, using timing-safe comparison to prevent side-channel attacks.

The signing secret needs to rotate for the same reasons any long-lived secret needs to rotate: the secret may have been exposed (committed to a public repo, logged inadvertently, captured in a screenshot, copied to a third-party tool), the secret may be subject to a policy that requires periodic rotation regardless of exposure, the secret may have been issued to an employee who has since left the company. In each case, the response is to retire the old secret and start using a new one.

The challenge is that the rotation has to be coordinated between sender and receiver. If the sender simply switches to a new secret, every webhook signed with the new secret will fail verification on the receiver until the receiver updates its configuration. If the receiver simply switches to a new secret, every webhook still in flight signed with the old secret will fail verification. The naive rotation is a small outage either way.

The multiple-active-secrets pattern

The pattern that works is to support multiple active secrets per endpoint, with explicit lifecycle management. The schema looks like this:

CREATE TABLE webhook_endpoints (
  id BIGINT PRIMARY KEY,
  account_id BIGINT NOT NULL,
  url TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL
);

CREATE TABLE webhook_signing_secrets (
  id BIGINT PRIMARY KEY,
  endpoint_id BIGINT NOT NULL REFERENCES webhook_endpoints(id),
  secret_hash TEXT NOT NULL,
  prefix TEXT NOT NULL,
  active BOOLEAN NOT NULL DEFAULT TRUE,
  created_at TIMESTAMPTZ NOT NULL,
  retired_at TIMESTAMPTZ
);

The endpoint has multiple secrets associated with it, each with its own lifecycle. The sender signs each outgoing webhook with the most recently created active secret. The receiver, when verifying, accepts a webhook as valid if it matches any of the currently-active secrets for that endpoint.

The rotation procedure is then:

  1. Customer requests a new secret. The system generates a new secret, stores its hash, and marks it active. The customer is shown the plaintext exactly once, with instructions to store it securely.
  2. Customer updates their receiver to accept either the old or the new secret during the transition window. The receiver code typically checks both secrets and accepts the webhook if either matches.
  3. Customer confirms the receiver is updated. The system continues signing with the most recently created active secret (the new one), while the old secret remains valid for verification.
  4. After the transition window expires (typically 24-48 hours), the old secret is marked retired. The system stops accepting verification against retired secrets, and the receiver can drop the old secret from its configuration.

The transition window is the load-bearing detail. It needs to be long enough that the receiver has time to deploy the change and propagate it to all instances, but short enough that the rotation completes within a reasonable operational window. 24-48 hours is the right default for most B2B SaaS, with explicit customer-configurable extension up to a week for customers with slow deployment processes.

The signature header format

The signature header should indicate which key was used to sign the webhook, so the receiver can quickly identify the right secret to verify against. The standard format is something like:

X-Webhook-Signature: t=1715000000,v1=hex(hmac_sha256(secret, "t=1715000000." + body))

The format includes the timestamp (for replay protection) and the version prefix (for migration to different signing algorithms in the future). The receiver verifies the timestamp is within a tolerance window (5 minutes is standard), then computes the HMAC over the same input string and compares against the v1 value.

The optional addition for rotation is a key identifier in the header:

X-Webhook-Signature: t=1715000000,kid=whsec_v2_abc123,v1=hex(...)

The kid (key ID) tells the receiver which of the active secrets was used to sign this webhook, so the receiver can verify against only that secret rather than trying all active secrets. The kid is the prefix from the webhook_signing_secrets table, which is non-sensitive and can appear in the header in plaintext.

Including the kid is not strictly necessary if the number of active secrets is small (the receiver can just try all of them), but it is a useful optimization at scale and a useful diagnostic for debugging.

The dashboard surface

The customer-facing dashboard for rotation needs four operations:

  • List active secrets for an endpoint, with prefix (not full secret), creation time, and active/retired status.
  • Create a new secret, returning the plaintext exactly once with instructions.
  • Retire a secret, with a confirmation step explaining that webhooks signed with the retired secret will start failing verification.
  • View recent verification failures, broken down by which secret was attempted and why verification failed.

The recent-failures view is the most underrated of the four. During a rotation, the most common failure mode is that the receiver did not pick up the new secret correctly, or that one of multiple receiver instances has stale configuration. The verification failure view shows the customer exactly which webhooks failed and why, which lets them diagnose without filing a support ticket.

What goes wrong

The most common rotation failure is that the customer retires the old secret before the receiver is actually using the new secret. This produces a sudden cliff of verification failures starting at the retirement time. The mitigation is a confirmation flow that requires the customer to acknowledge they have updated their receiver before retiring, and a brief delay (15 minutes is common) between the retirement request and the actual retirement to allow rollback if needed.

The second common failure is that the customer's receiver is split across multiple instances and the configuration update is uneven. Some instances pick up the new secret quickly, others have caches or environment-variable values that take longer to refresh. The mitigation is the multiple-active-secrets pattern itself: as long as both secrets are active, the receiver instances can pick up the new one at their own pace.

The third failure is that the customer never actually deployed the receiver change. The system signs with the new secret, the receiver still expects the old, every webhook fails verification, and the customer's product breaks. The mitigation is to surface verification failure rates prominently in the dashboard and to alert customers when verification failure rates spike during the transition window.

The application across our four products

WebhookVault is the most rotation-relevant of our four products because it is the webhook-receiving product. Customers configure WebhookVault to receive webhooks from third parties and forward them to their own infrastructure, and the secret-rotation story applies on both sides: WebhookVault needs to handle rotations from the third parties whose webhooks it receives, and customers may want to rotate the secrets used between WebhookVault and their own receivers.

CronPing sends webhooks for monitor state changes, and supports the multiple-active-secrets pattern for the outgoing direction. Customers can rotate the signing secret for their monitor webhooks without coordinating a maintenance window with us.

FlagBit sends webhooks for flag changes and supports the same pattern. DocuMint sends webhooks for invoice generation completion (less frequent than the other products) and supports the pattern for completeness.

The shared rotation infrastructure is one of the small benefits of having a single codebase strategy across the four products. The webhook signing and rotation code is the same across all four, with the differences being which events get sent and what schemas the payloads follow.

Three observations

First: secret rotation is one of the operational features that customers do not know they need until they have a security incident. The features compound investment in the trust dimension: customers who have rotated keys without incident have a better operational relationship with the vendor than customers who have never rotated. The rotation story should be documented as a first-class operation, not a hidden feature only available to customers who file support tickets.

Second: the multiple-active-secrets pattern is structurally similar to the API key rotation pattern, the JWT signing key rotation pattern, the TLS certificate rotation pattern, and several other long-lived-secret patterns. The recurring theme is overlap-during-transition rather than instant-cutover. The pattern is general enough that a team that masters it for one use case can apply it across multiple security primitives.

Third: the dashboard surface for rotation is more important than the underlying mechanism. The cryptography is well-understood and the multiple-active-secrets schema is a small change. What customers actually use is the create-new-secret button, the confirmation flows, and the verification-failure view. Investing in the customer-facing surface pays back more than investing in additional cryptographic features that customers do not interact with.

The deeper observation is that webhook integration is a domain where the developer experience of operational features like rotation matters as much as the technical features like signature verification. The receiver-side complexity that the sender exposes (multiple secrets to manage, rotation procedures to follow, failure modes to diagnose) is part of the product, and the difference between a product where rotation is a five-minute self-service operation and one where it is a multi-hour coordinated effort is the difference between a product customers trust for production traffic and one they do not. The mechanism is small and the polish around it is most of the work.

Read more