Real-Time Transport for SaaS APIs: SSE, WebSockets, and Long Polling Compared

The choice between Server-Sent Events, WebSockets, and long polling is not really a choice between protocols. It is a choice about which complexity tax you want to pay, and where in your stack you want to pay it.

The first time you need to push data from a server to a browser without making the user reload, you discover that the web has roughly three answers, none of them obviously the right one. Long polling has been around since the 1990s and still works. WebSockets are a real bidirectional protocol but require a full duplex connection that fights with most HTTP infrastructure. Server-Sent Events feel like a compromise that almost no one talks about and turn out to be exactly right for more cases than the noise around WebSockets would suggest.

The choice between them is rarely about technical capability — all three can deliver a notification within a second or two of an event happening. The choice is really about which complexity tax you want to pay and where in your stack you want to pay it. This piece walks through the actual failure modes, not the marketing comparison.

Long polling: the workhorse

Long polling is the simplest pattern: the client makes an HTTP request, the server holds the connection open until either an event happens or a timeout fires, then responds. The client immediately makes another request. The pattern is so simple that it survives every kind of corporate firewall, load balancer, and proxy because it is just HTTP requests.

Its downsides become visible at scale. Each request is a full HTTP round trip with all the headers, cookies, and TLS handshake overhead amortized only across one event. If the average user is connected for an hour and events are rare, you are doing dozens of requests for nothing. The server has to hold each connection open, which means tuning the connection pool, the timeout discipline, and the upstream socket budget. None of this is impossible, but the operational burden grows linearly with concurrent users.

Where long polling shines is in environments you do not control: customer corporate networks, behind NAT, behind proxies that strip the Upgrade header. If your product has to work in a hospital IT environment, long polling is the answer because it is the only one that always works.

Server-Sent Events: the underappreciated middle

Server-Sent Events (SSE) is a one-way streaming protocol over HTTP. The server holds a single connection open and pushes text events down it as they happen. The client reconnects automatically if the connection drops, with a built-in event ID for resume-where-you-left-off semantics. The browser API (EventSource) is a single line of JavaScript.

SSE has three properties that make it the right default for most server-to-client push:

  • It is HTTP. Every load balancer, every proxy, every CDN already understands it. No special configuration, no Upgrade header negotiation.
  • It is one connection per client, not one per event. The TLS handshake happens once. The connection lives for as long as the user stays on the page.
  • It has built-in reconnect with event ID continuity, so handling temporary network blips requires zero application code.

Its limitations are also real. SSE is server-to-client only. If you need the client to push to the server, you do that as a regular HTTP POST on a separate connection. The connection limit per origin in browsers is around six, which means a single tab cannot have many simultaneous SSE streams (though HTTP/2 multiplexes them and lifts the limit substantially). And SSE is text-only — binary data has to be base64-encoded, which is fine for small payloads but expensive at scale.

For dashboards, notifications, build status, log tails, deployment progress, and most "show me when something changes" use cases, SSE is the right answer and almost no one reaches for it because the WebSocket noise is louder.

WebSockets: when bidirectional is non-negotiable

WebSockets are a full duplex protocol. After an HTTP handshake that upgrades the connection, both sides can send messages whenever they want. Binary is native. Frames are tiny. Latency is as low as the network allows.

The cost is everything that comes with breaking the HTTP contract. Most CDNs and reverse proxies require explicit configuration to handle WebSocket connections. Authentication tends to be tricky because there is no standard place for an authorization header after the initial handshake. Heartbeats and reconnect logic have to be implemented by hand because the protocol does not standardize them. Server-side, you cannot rely on the same per-request middleware stack that serves the rest of your API — WebSocket handlers usually live in their own corner of the codebase with their own auth code, their own rate limiting, and their own observability.

The cases where WebSockets are unambiguously the right choice are interactive real-time experiences: collaborative editing, multiplayer games, voice and video signaling, trading interfaces. If the user is typing into the connection in real time and you need server responses inside 50 ms, WebSockets are correct. If you are pushing notifications to a dashboard, you are paying the WebSocket complexity tax for no benefit.

The horizontal scaling problem

The protocol choice is the easy part. The hard part is scaling out. Long polling, SSE, and WebSockets all share the same scaling challenge: each connected client lives on one specific server, and events that need to reach that client must somehow find that server.

Three patterns solve this:

Sticky sessions plus pub-sub. The load balancer routes a given client to a given server. The server subscribes to a Redis or NATS or RabbitMQ channel for that client. When an event for that client is published, the server consumes it and pushes it down the connection. This works well for moderate scale and has the benefit of localizing state.

Connection-aware fanout. A separate connection layer (a "hub" tier) holds all the long-lived connections and subscribes to events from the application tier. The application tier never holds a connection. This is the pattern most platforms-as-a-service implementations like Pusher, Ably, and PubNub use, and it is what you would build if you needed to scale to millions of concurrent connections without burning your application servers' file descriptor budget on idle clients.

Per-tenant isolation. If your tenants are workspaces or organizations, you can shard the connection routing by tenant ID. Tenant A's users always go to one cluster of servers, tenant B's go to another. This trades some flexibility for much simpler routing and much better isolation under failure.

What we actually do

Across our four products — DocuMint, CronPing, FlagBit, and WebhookVault — we use long polling for nothing and SSE for almost everything. WebhookVault uses SSE to live-stream incoming webhook captures into the dashboard. CronPing uses SSE to push monitor status changes. The connection lives for as long as the dashboard tab is open, the events are tiny JSON, the reconnect-on-network-blip is automatic, and the entire serving stack is a few hundred lines of FastAPI.

If we ever need bidirectional with low latency — collaborative flag editing in FlagBit, for instance — we will reach for WebSockets and pay the tax. We have not needed to yet, and the temptation to reach for them by default is the temptation to add complexity that does not earn its keep. The deeper point is that the right protocol is the one that buys you the capabilities you actually use, and not the most general one available. SSE is the underappreciated middle, and almost any "I need real time" requirement is really an SSE requirement in disguise.

Read more