Polling vs Webhooks: When to Push and When to Pull

The choice between polling and webhooks looks like a technical decision but is actually about who bears the cost of latency and reliability. The patterns that hold up are the ones that match the cost-bearer to the party with the most leverage.

One of the recurring decisions in API design is whether to expose state changes as something callers poll for or something the API pushes to caller-supplied URLs. The textbook answer is push (webhooks) because polling is wasteful, but the textbook answer is too simple. Polling and webhooks have asymmetric costs and asymmetric reliability properties, and the right choice depends less on technical considerations than on who bears the operational cost of each failure mode and who has the most leverage to fix it.

The patterns in this post apply across the four products in our studio — DocuMint, CronPing, FlagBit, and WebhookVault — and to any system where one service needs to react to events in another.

Polling: caller-controlled, simple, latency-bounded

Polling means the caller fetches the current state on a schedule and notices changes by comparing successive responses. The implementation is straightforward: the caller runs a timer, hits an endpoint, parses the response, and acts on differences. The endpoint is a normal HTTP GET that returns the current state, possibly with cursor or timestamp parameters to limit the response to changes since the last poll.

The advantages of polling are subtle but real. The caller controls the polling rate, which means it can ramp up during busy periods and ramp down during idle periods without coordinating with the API. The caller controls the polling endpoint, which means it can be hosted on the caller's infrastructure with whatever reliability properties the caller wants. The caller controls the response handling, which means a failure to handle a response just means waiting for the next poll — there's no missed-event risk because the next poll will see the same state.

The disadvantages of polling are that it scales linearly with the number of callers and inversely with the polling interval. A million callers polling every minute is a lot of requests for an API that may have very few state changes. The latency of state propagation is bounded above by the polling interval, so a one-minute poll means up to one minute of staleness in the worst case. The wasted work — polling for no change — is the cost the caller pays in exchange for the simplicity and reliability.

Webhooks: API-controlled, complex, latency-near-zero

Webhooks mean the API pushes events to URLs the caller has registered, typically via HTTP POST. The implementation is more involved on both sides. The API has to maintain a list of subscriptions, generate events on state changes, deliver events with retries on failure, and handle the security of a request originating from inside the API and going out to a caller-supplied URL. The caller has to expose a public HTTP endpoint, verify the events are genuine (signature verification), handle them idempotently, and respond quickly enough not to time out the API's delivery.

The advantages of webhooks are real when the event-to-action latency matters. State changes propagate within seconds rather than minutes; callers don't waste cycles polling for nothing; the API can deliver events efficiently to many subscribers without each one paying a polling cost. For high-volume integrations where the cost of latency is real money — payment confirmations, fraud detection, inventory updates — webhooks are usually the right answer.

The disadvantages of webhooks are that they fail in subtle ways. A caller's endpoint goes down for an hour and the API has to retry; the caller's endpoint comes back up and the events arrive out of order; the caller's endpoint is slow and times out delivery; the caller's endpoint is fast but has a bug that drops events silently. The reliability of the integration depends on the weakest link, and the weakest link is usually the caller's endpoint, which the API has no control over.

The cost asymmetry

The polling-vs-webhooks decision is often framed as efficiency, but the more useful framing is who bears the cost of each failure mode. With polling, the caller bears all the costs: the cost of running the polling loop, the cost of the wasted requests, the cost of any latency from the polling interval. The API only bears the cost of serving the GET requests, which is bounded and predictable.

With webhooks, the API bears most of the costs: the cost of maintaining subscriptions, the cost of generating and signing events, the cost of retrying failed deliveries, the cost of investigating delivery problems. The caller bears the cost of running an endpoint, which they often don't want to do — many integrations are between an API and a caller that doesn't have public infrastructure to begin with.

The pattern that follows from this is that the right choice depends on which party has the most leverage to fix the failure modes. If the API team is small and the integrators are large enterprises with sophisticated infrastructure, polling makes sense because the integrators can absorb the operational complexity. If the API team is large and the integrators are small developers who just want the events delivered, webhooks make sense because the API team can absorb the operational complexity.

The hybrid pattern: webhooks with polling as fallback

The pattern that holds up in the largest production deployments is to offer both. The primary mechanism is webhooks, with the API pushing events to caller-registered URLs as they happen. The fallback mechanism is polling, with the API exposing a list-events endpoint that returns events since a given cursor. Callers consume events via webhooks during normal operation and via polling when their endpoint has been down or when they're recovering from an outage.

The hybrid pattern handles the common failure modes naturally. The caller's endpoint is down for an hour, webhooks fail and the API gives up after the retry budget is exhausted; the caller's endpoint comes back up; the caller polls the list-events endpoint to catch up on what was missed; the caller is back in sync. The pattern requires the API to maintain an event log with a stable cursor, but the cost of that infrastructure is small compared to the cost of debugging missed-event reports from integrators.

Stripe is the canonical example of the hybrid pattern. Webhooks are the primary delivery mechanism for events; the events.list endpoint is the polling fallback. Integrators are encouraged to use webhooks for low-latency event handling and polling for reconciliation and recovery. The combination handles essentially every failure mode that pure webhooks alone cannot.

SSE and long polling: the middle ground

Between polling and webhooks lies a family of techniques that share characteristics with both: server-sent events, long polling, WebSockets. The idea is that the caller initiates a long-lived connection to the API, and the API streams events down the connection as they happen. The caller doesn't need a public endpoint; the API doesn't need to retry failed deliveries; the latency is near zero.

The trade-off is that long-lived connections are operationally more complex than either short HTTP requests or POST-based webhooks. The caller has to handle reconnection on connection loss; the API has to handle backpressure when the caller is slow to consume; the infrastructure has to support long-lived connections through load balancers and reverse proxies that may have their own timeouts. SSE is the right answer for some integrations — particularly real-time dashboards and notification feeds — but it's not a drop-in replacement for either polling or webhooks.

The integration we actually built

For the four products in our studio, we offer webhooks as the primary mechanism for event delivery — payment events, monitor state changes, flag updates, webhook deliveries. Each event is signed with HMAC-SHA256 over the raw body bytes, retried with exponential backoff up to twenty-four hours, and viewable through a dashboard that shows delivery attempts and responses. The integrators that want polling get an events-list endpoint they can poll on whatever schedule they like.

The decision to offer both was less about technical considerations than about respecting the diversity of integrators. Some are professional developers running production infrastructure who want webhooks; some are weekend hobbyists running cron jobs who prefer polling because it doesn't require them to expose a public endpoint. The hybrid approach lets each integrator choose the pattern that fits their operational comfort level, and the cost on our side is small.

The deeper observation

The polling-vs-webhooks debate is usually framed as a technical question about efficiency, but the more useful framing is about who bears the operational cost. Polling shifts cost to the caller in exchange for simplicity and reliability; webhooks shift cost to the API in exchange for low latency. The hybrid pattern lets each integration choose the right balance for its situation, and the small infrastructure cost on the API side is paid back many times over in reduced support burden. The teams that take this seriously offer both and let integrators choose; the teams that don't insist on one or the other and find that integrators route around them.

Read more