Designing Webhook Subscription Validation: How to Reject Bad URLs Before Customers Notice
A webhook subscription endpoint that accepts any URL eventually accepts URLs that point to localhost, private networks, expired domains, and the wrong customer. Validating at subscription time is much cheaper than validating during every delivery.
A webhook subscription endpoint accepts a customer-provided URL and starts sending HTTP requests to it. The naive implementation accepts any string, stores it as the destination, and lets the delivery worker discover problems later. This works for the happy path. It fails badly for the unhappy path, which is the path the customer is most likely to remember.
The failure modes are predictable. A customer types http://localhost:3000/webhook in their staging dashboard and forgets to change it before going to production; the webhooks deliver to nothing for a week before anyone notices. A customer types https://192.168.1.5/hook, which works on their VPN and not from our delivery infrastructure. A customer's domain expires, and the DNS now points to a parking page that returns 200 OK on every POST, so our delivery counters look healthy while the customer is silently missing every event. A customer accidentally enters another customer's URL, and personally-identifiable data starts flowing to the wrong place.
Validating at subscription time is much cheaper than validating during every delivery. The customer is interactively waiting for the response and can fix typos immediately. The validation cost is paid once per subscription rather than once per event. The error messages can be specific in ways that delivery-time errors cannot easily be.
What to validate
The validation surface breaks down into four categories: URL syntax, network reachability, security policy, and ownership verification. Each has different costs and different effectiveness.
URL syntax validation is the cheapest and most universally useful. The URL should parse as a valid URL with an http or https scheme. The scheme should be https in production environments (allow http only in explicit sandbox modes). The hostname should resolve to a public IP. The port, if specified, should be a standard one (80, 443) or in the high-numbered range commonly used for application servers. The path is opaque and can be anything.
Network-level validation catches most of the failure modes that syntax validation misses. The hostname should resolve via public DNS, which excludes localhost and .local and most internal hostnames. The resolved IP should not be in the private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16) or in loopback (127.0.0.0/8), which excludes the most common mistakes. The full SSRF-prevention list also excludes carrier-grade NAT (100.64.0.0/10), IPv6 link-local (fe80::/10), and a dozen smaller ranges; a complete list is available from IANA.
Security policy enforcement happens after the basic checks pass. The URL must use https in production. The domain may need to be on an allowlist (some compliance regimes require this). The port may need to be 443 only (some compliance regimes require this too). These rules are customer-configurable in mature implementations and team-defined in simpler ones.
Ownership verification is the hardest and most important. The customer says they own example.com/webhook. How do we know they actually do, and not that they typed a URL belonging to someone else by accident or malice? The standard answer is a challenge-response: send a test request to the URL with a unique nonce, require the response to echo the nonce, and only activate the subscription once the verification succeeds. This is the pattern used by AWS SNS, Stripe (for endpoint creation), GitHub webhooks, and many others.
The challenge-response pattern
The basic flow has four steps. First, the customer submits the URL to subscribe. Second, the server generates a verification nonce and sends a special verification request to the URL containing the nonce. Third, the customer's endpoint receives the verification request, recognizes it as a verification, and responds with the nonce in the expected format. Fourth, the server records the subscription as verified and starts sending real events.
The verification request format matters. The verification should be distinguishable from real events; a common pattern is a special header (X-Webhook-Verification: nonce-value) or a special event type (webhook.verification) that the customer's endpoint can recognize. The response format should be straightforward; common patterns are echoing the nonce in the response body or computing an HMAC of the nonce with a customer secret.
The verification timeout matters. The customer endpoint should respond within a few seconds; longer than that suggests the endpoint is overloaded or doesn't handle the verification properly. A 30-second timeout is common; if the customer's endpoint doesn't respond, the subscription is marked as pending verification and the customer can retry.
The re-verification policy matters. Some services re-verify periodically (every few weeks or months) to catch endpoints that have drifted out of working condition. Others re-verify only on customer request. The trade-off is between catching drift early and adding load to working endpoints. A reasonable middle ground is to re-verify when delivery failures cross a threshold (such as 100 consecutive failures), which catches expired domains and misconfigured endpoints without adding routine load.
The DNS-rebinding subtlety
The straightforward implementation of "resolve the hostname, check the IP is public, then make the HTTP request" has a subtle race condition. The DNS lookup and the HTTP connection are two separate operations. A malicious server can return a public IP for the validation lookup and a private IP for the actual connection, exploiting the gap between the two.
The fix is to resolve the hostname yourself, verify the IP is public, and then make the HTTP connection to that specific IP (with the original Host header). This forces the same IP to be used for both validation and connection. Most HTTP client libraries support this with some configuration; the technique is sometimes called "pinned DNS" or "address pinning".
The fix is necessary in production environments where the webhook delivery service has access to internal resources. Without it, an attacker who creates a subscription pointing to attacker.example.com can use DNS rebinding to make the delivery service hit internal services on the localhost or internal network when an event is delivered, effectively turning the webhook system into a remote-code-execution surface.
Error messages that help
The error messages returned during subscription validation should be specific and actionable. The customer is waiting for the response and will read the message. Generic errors like "Subscription rejected" are worse than no message; specific errors like "URL must use https" or "Hostname resolves to a private IP address" let the customer fix the problem immediately.
The verification failure message should distinguish between "endpoint did not respond" and "endpoint responded but with the wrong format" and "endpoint responded with the right nonce but the wrong signature". Each of these has a different remediation, and the customer needs to know which one occurred.
The error response should include a documentation link explaining the validation rules and the verification protocol. The customer who hits a validation failure is in a state where they want to understand the rules; the documentation should be directly reachable from the error message rather than requiring a separate search.
What this looks like across our products
Our four products (DocuMint for PDF invoice generation, CronPing for cron job monitoring, FlagBit for feature flags, and WebhookVault for webhook debugging) have webhook surfaces at varying maturity levels. CronPing has the most mature surface because monitor state changes are the primary integration point; WebhookVault is the product most directly built around webhook ergonomics and is where the validation discipline is most fully implemented.
The validation rules we apply are: URL must parse, scheme must be https (http allowed only in sandbox mode), hostname must resolve to a public IP, port must be 443 or in the high range, the IP-pinning is applied to prevent rebinding, and a challenge-response verification is required before activation. The verification uses a unique nonce per subscription with HMAC signature, valid for 24 hours. Verification failures show specific error messages with links to the documentation.
What does not work
Skipping validation entirely. The cost is paid later, in customer-support tickets and missed deliveries and (in the worst case) data leakage.
Validating only the URL syntax. The most common mistakes (localhost, private IPs, expired domains) pass syntax validation and require network-level checks to catch.
Validating only at subscription time and never re-verifying. Endpoints drift; domains expire; SSL certificates lapse. A periodic or threshold-triggered re-verification catches drift without adding routine load.
Failing silently. A subscription that fails verification should be visibly in a "pending" or "rejected" state in the customer dashboard, not just absent from the events log. The customer needs to know that the subscription is not active and what to do to activate it.
Three observations
First: webhook subscription validation is one of the places where doing the upfront work pays back many times over the lifetime of the subscription. The cost of validation is paid once; the cost of dealing with delivery failures from invalid subscriptions is paid every event.
Second: the failure modes are predictable and most of them are addressed by a small set of rules. The eighty-percent solution is: parse the URL, check the scheme, check the hostname resolves to a public IP, send a verification request with a nonce, require the response to echo the nonce. The remaining twenty percent (DNS rebinding, re-verification, ownership disputes) is worth investing in but should not block the eighty-percent solution from shipping.
Third: the validation discipline is one of the visible markers of a thoughtful webhook product. Customers who have integrated webhooks across multiple services notice the difference between a service that accepts any string and a service that catches their typo before they finish leaving the dashboard. The marker compounds: a service that gets validation right tends to get the rest of the webhook ergonomics right too, because the same care that produces good validation also produces good error messages and good delivery logging and good replay tooling.
The deeper observation is that webhook subscriptions are a primitive that customers use rarely (once per integration) but that affects every event delivery. The leverage is high, the customer attention is low, and the failure modes are slow and silent. The investment in subscription validation is a way to convert a class of slow-silent failures into fast-loud errors that customers can fix at the time of subscription, before any event has been lost. The conversion is one of the highest-leverage product decisions a webhook service can make.