WebSockets in Production: Sticky Sessions, Reconnection, and the Patterns That Survive

Most WebSocket implementations work fine in development and break in production at the boundaries: load balancer assignment, reconnection after network blips, server restarts, scaling beyond a single instance. The patterns that survive aren't complicated, but they aren't optional either.

WebSockets occupy an interesting position in 2026: they've been a stable web standard for thirteen years, every major browser supports them, and most teams still get the production deployment wrong. The protocol itself is simple. The patterns that surround it — load balancer configuration, reconnection logic, scaling beyond a single server, message ordering, dead-connection detection — are where the real work lives, and most introductory documentation skips over them entirely. The result is a class of WebSocket bugs that don't appear in development, appear briefly in staging, and become routine in production.

The patterns in this post matter for any system maintaining persistent connections to clients. We use SSE for almost everything across DocuMint, CronPing, FlagBit, and WebhookVault — but the deployment patterns for SSE and WebSockets are nearly identical, and the lessons transfer.

The connection lifecycle

A WebSocket connection starts as an HTTP/1.1 request with an Upgrade: websocket header. The server responds with a 101 Switching Protocols, and from that point onward the TCP connection carries WebSocket frames in both directions until either party closes it. The connection persists as long as the underlying TCP connection persists, which in real network conditions is shorter than developers usually assume — corporate firewalls and home routers commonly close idle TCP connections after five to fifteen minutes, mobile networks often close them when handing off between cell towers, and any kind of network blip drops them entirely.

This persistence assumption is where most production bugs originate. A WebSocket connection that lasts 3 hours in development on a well-connected machine may last 4 minutes on a customer's office Wi-Fi behind a NAT with aggressive idle timeouts. Code that assumes the connection will be stable will eventually fail in obvious ways, and code that assumes the connection might disconnect at any moment will mostly work.

Sticky sessions and the load balancer

If you're running more than one server instance, the first thing that breaks is the load balancer. A typical HTTP load balancer round-robins requests across instances. When a WebSocket request comes in and gets routed to instance A, the upgrade succeeds and the persistent connection is established with instance A specifically. If a subsequent message routing assumption is that any instance can serve any client, the architecture is wrong.

The standard answer is sticky sessions: the load balancer remembers which instance each client was assigned to and routes subsequent connections from that client back to the same instance. AWS ALB supports this via stickiness configuration on the target group. Cloudflare supports it via session affinity. Most modern load balancers have the option somewhere in the configuration. The implementation usually relies on a cookie set by the load balancer.

Sticky sessions handle the common case but introduce a subtle scaling problem: if instance A has 10000 connected clients and instance B has 100, traffic to instance A will continue to be routed to instance A regardless of load. The cookie-based stickiness is durable across the client's session, which means a load imbalance can persist for hours. The fix is connection rebalancing during reconnection: when a client reconnects (after a network blip, after the server restarts, or because the connection has been alive too long), the load balancer should consider current instance load when assigning the new connection.

Reconnection logic

Client-side reconnection is the highest-leverage thing the client can implement. The minimum viable version is exponential backoff with jitter: after a disconnect, wait a random interval between 0.5 and 1 seconds, then double the wait time on each failed reconnect attempt up to some cap (30-60 seconds is reasonable). The exponential part prevents reconnection storms when many clients disconnect simultaneously (server restart, network blip affecting a region); the jitter spreads the storm over time so the recovering server isn't hit by a synchronous wave.

The harder problem is what to do about messages sent during the disconnected interval. The two general approaches are sequence numbers (each message has a monotonically increasing ID, the server tracks which IDs each client has acknowledged, and the client requests anything missing on reconnect) and event sourcing with checkpoints (the server maintains a durable event log and the client tells the server its last seen position on reconnect). Both work; the tradeoffs are mostly about how much message history the server is willing to keep and how stateful the client connection needs to be.

The wrong answer is to ignore the problem and pretend that messages sent during disconnection don't matter. They do matter, and the customers who notice are the ones with bad networks, which is most customers most of the time. A good test is to deliberately disconnect the client for 30 seconds during operation and verify that the application recovers correctly when the connection comes back. If it doesn't, the application has a bug, regardless of how fast the network usually is.

Heartbeats and dead connection detection

TCP itself doesn't reliably tell you when a connection is dead. If the network path between two endpoints fails (unplugged cable, dropped Wi-Fi, dead cell tower), the TCP stack on both ends doesn't know. The connection appears alive — sockets are still open, no errors have been raised — but no traffic is flowing. The OS-level keepalive mechanism (TCP_KEEPALIVE) eventually detects this, but the default timeouts on most systems are measured in hours.

For WebSocket connections, the application needs its own heartbeat mechanism. The protocol provides ping and pong frames specifically for this purpose. The standard pattern is for the server to send a ping every 30 seconds and consider the connection dead if it doesn't receive a pong within some window (60-90 seconds is reasonable). When the connection is determined dead, the server closes its socket and removes the client from any internal state. The client's reconnection logic kicks in.

The interval matters. Shorter intervals detect failures faster but generate more traffic. 30 seconds is the conventional sweet spot — it catches NAT timeout issues (most NATs use 5-minute idle timeouts) and is short enough that a dead connection doesn't waste server resources for long. Longer intervals (60-120 seconds) are sometimes used for very high-scale deployments to reduce traffic, but they delay failure detection.

Scaling beyond one server

If a message originating on the server side needs to reach a client, the message has to be routed to whatever instance currently holds that client's connection. With one server, this is trivial. With multiple servers, it's a pub-sub problem.

The simplest pattern is broadcast: every server subscribes to a shared channel (Redis pub-sub, NATS, RabbitMQ, or similar), and any server publishing to that channel reaches all servers. Each server then checks whether the message is for any of its connected clients and forwards it. This is operationally simple but scales poorly past low tens of servers because every server processes every message.

The next pattern up is targeted routing: a coordinator service tracks which clients are connected to which servers (typically in Redis with a short TTL), and senders publish messages directly to the target server's channel. This requires more bookkeeping but scales much better — each server only processes messages for its own clients.

The third pattern is per-tenant isolation: clients are sharded by tenant, and each tenant's connections live on a specific subset of servers. Cross-server traffic within a tenant becomes manageable because the relevant servers are a small known set. This pattern is harder to retrofit but scales the furthest, and it's what most large WebSocket deployments end up at eventually.

Message ordering and delivery semantics

WebSocket frames within a single connection are ordered by the protocol. Frames from the same sender will be received by the recipient in send order. This is a TCP guarantee inherited by WebSockets.

What WebSockets don't guarantee is anything about delivery semantics across reconnects. If a client disconnects after the server sent message 5 but before the client acknowledged it, the server has no way to know whether the client received the message. On reconnect, the server has to either resend the message (potentially producing a duplicate) or drop it (potentially losing data). Application protocols layered on top of WebSockets typically pick one of these and document the choice.

If the application requires exactly-once delivery semantics, both sides need to track sequence numbers and the application protocol needs to handle deduplication. This is the same set of problems as any message queue, and the same solutions apply — it doesn't get easier just because you're using WebSockets.

The graceful shutdown problem

When a server instance is shutting down for a deployment, the standard graceful shutdown sequence is: stop accepting new connections, wait for existing connections to drain, exit. With WebSockets, "drain" is ambiguous because connections are persistent — a client might stay connected for hours. Waiting indefinitely isn't an option.

The pattern that works is: send a close frame to all connected clients with a status code indicating server-initiated close, give clients a brief window (a few seconds) to disconnect cleanly, and then forcibly close any remaining connections. The clients' reconnection logic, which they need anyway, kicks in and routes them to a different instance.

The corollary is that server deployment cadence matters. If the server restarts every few minutes, the constant disconnect-reconnect cycle is wasteful and visible to clients. If it restarts every few weeks, the disruption is rare enough not to matter. Most WebSocket-heavy services aim for deployment intervals of at least a day, often a week, to keep the reconnection rate low.

The five operational signals

The basic monitoring panel for a WebSocket service has five signals. Active connections per instance, with alerts on extreme imbalance. Connection establishment rate, which catches mass-disconnection events. Message rate per direction, which catches stuck connections. Heartbeat failure rate, which catches network problems. Connection lifetime distribution, which catches NAT-timeout issues that show up as a hard ceiling on connection durations.

The alerts that matter are: connection imbalance (one instance has 10x the connections of another), establishment rate spike (something is causing mass reconnects), message rate to zero on an active connection (stuck connection), and connection lifetime ceiling matching a likely NAT timeout (network path is killing connections).

The deeper observation

WebSockets work. They've worked for over a decade. The problems most teams encounter aren't with the protocol; they're with the assumption that a persistent connection will stay persistent and that all the operational concerns of HTTP — load balancing, scaling, deployment, reconnection, state management — somehow don't apply because the connection is open. They do apply. They just look different. The right mental model is that a WebSocket connection is a TCP connection with a thin framing layer on top, and TCP connections in the real world break constantly. Designing for that reality from the start produces a system that survives production. Designing for an idealized always-on connection produces a system that mostly works, except when it matters.

Read more