You implement WebSockets. The connection works in development. In production, it drops after exactly 60 seconds of idle. You add reconnection logic. The connection reconnects and drops again after exactly 60 seconds. You search for "WebSocket connection drops 60 seconds." You find dozens of people with the same problem. None of them have a clean answer.
The problem is that there are three layers of infrastructure between your client and your server, and all three have a default timeout of somewhere between 60 and 100 seconds that nobody told you about.
Nginx: proxy_read_timeout
If you're proxying WebSocket connections through Nginx, the relevant directive is proxy_read_timeout. Its default is 60 seconds.
proxy_read_timeout defines the timeout for reading a response from the proxied server. For HTTP, this is the time to wait for a response after sending a request. For WebSockets, it becomes the maximum time allowed between messages. If the connection is idle—no data sent in either direction—for 60 seconds, Nginx closes it.
The fix:
location /ws {
proxy_pass http://backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 3600s; # 1 hour
proxy_send_timeout 3600s;
}Setting this to a large value prevents Nginx from closing idle WebSocket connections. But this only addresses the Nginx layer. You may have others.
AWS ALB: idle_timeout
AWS Application Load Balancers have an idle timeout setting. The default is 60 seconds.
The ALB idle timeout applies to connections through the load balancer, including WebSocket connections. If no data is transmitted in either direction for 60 seconds, the ALB closes the connection. Unlike Nginx, this is not a proxy timeout—it's a connection-level timeout applied by the load balancer infrastructure itself.
The setting is configurable. In the AWS console, go to the load balancer attributes and change Idle timeout. The maximum is 4000 seconds. For WebSocket applications, setting this to at least 300-600 seconds is reasonable. Setting it very high is fine if your application can handle long-lived idle connections.
CLI:
aws elbv2 modify-load-balancer-attributes --load-balancer-arn arn:aws:elasticloadbalancing:... --attributes Key=idle_timeout.timeout_seconds,Value=3600Cloudflare: WebSocket Timeout
Cloudflare has a 100-second WebSocket timeout. On free and Pro plans, this is not configurable.
The 100-second timeout applies to idle connections through Cloudflare's proxy. If your client doesn't send data (in either direction, including pings) within 100 seconds, Cloudflare terminates the connection. Business and Enterprise plans can configure this. Free plan users cannot.
This means that if you're on Cloudflare's free plan, you cannot extend the timeout through configuration. The only solution is application-level keepalives that prevent the connection from going idle.
The Three-Layer Stack
In a typical production deployment, a WebSocket connection might flow through:
- Cloudflare (CDN/proxy): 100s timeout, not configurable on free plans
- AWS ALB (load balancer): 60s default, configurable to 4000s
- Nginx (reverse proxy): 60s default, configurable
Each layer can terminate your connection independently. Fixing one doesn't fix the others. The effective timeout is the minimum across all layers.
If you're using Cloudflare on the free plan with default ALB and Nginx settings, your WebSocket connections will drop after 60 seconds—whichever of the three layers hits its timeout first.
The Real Fix: Application-Level Ping/Pong
Configuring infrastructure timeouts addresses the proxies. But it doesn't make your application robust. The correct fix is application-level ping/pong that keeps connections alive regardless of infrastructure.
The WebSocket protocol includes built-in ping/pong frames (opcodes 0x9 and 0xA). Most server frameworks expose these. The server sends a ping; the client must respond with a pong. If no pong arrives within a timeout period, the server closes the connection.
Server-side (Node.js with ws):
const ws = require('ws');
const server = new ws.Server({ port: 8080 });
const PING_INTERVAL = 30_000; // 30 seconds
const PING_TIMEOUT = 10_000; // 10 seconds
server.on('connection', function(socket) {
socket.isAlive = true;
socket.on('pong', () => { socket.isAlive = true; });
});
const interval = setInterval(() => {
server.clients.forEach(socket => {
if (!socket.isAlive) {
socket.terminate();
return;
}
socket.isAlive = false;
socket.ping();
});
}, PING_INTERVAL);With 30-second ping intervals, a connection is never idle for more than 30 seconds. This keeps it alive through Cloudflare's 100-second timeout, AWS ALB's 60-second default, and Nginx's 60-second default. The ping/pong traffic is minimal—a few bytes every 30 seconds—and also serves as a health check for the connection itself.
HAProxy: tunnel vs server Timeouts
If you're using HAProxy, the relevant distinction is between timeout tunnel and timeout server.
timeout server applies to HTTP responses and has a default that will close WebSocket connections. timeout tunnel applies specifically to bidirectional tunneled connections, including WebSockets after the upgrade.
defaults
timeout connect 5s
timeout client 50s
timeout server 50s
timeout tunnel 3600s # For WebSocket tunnelsWithout timeout tunnel, HAProxy uses timeout server for WebSocket connections, which is typically too short. Setting timeout tunnel explicitly prevents HAProxy from closing WebSocket connections during idle periods.
What Reconnection Libraries Actually Do
Libraries like socket.io (with its transport fallback), reconnecting-websocket, and similar tools handle reconnection automatically when connections drop. This makes the symptom—dropped connections—less visible to users. It does not fix the underlying problem.
Every reconnect involves a new handshake, new authentication, and potential loss of in-flight state. Depending on your protocol, reconnection may cause duplicate message delivery, missed messages, or client state inconsistency. Reconnection libraries are useful, but they're a recovery mechanism, not a prevention mechanism.
If you're relying on reconnection to handle drops every 60 seconds in production, you're paying for the overhead of constant reconnection when the correct fix is either extending timeouts or adding keepalives.
Debugging Timeout Issues
When a WebSocket drops at a suspiciously regular interval, the interval itself is diagnostic:
- ~60 seconds: ALB idle timeout (default) or Nginx proxy_read_timeout (default)
- ~100 seconds: Cloudflare WebSocket timeout
- ~120 seconds: Some HAProxy defaults
If you see drops at consistent intervals that match infrastructure defaults, the problem is almost certainly timeout configuration, not your application code.
The fix sequence: add application-level ping/pong first (this works regardless of which layer is closing the connection), then configure infrastructure timeouts to match your expected idle periods.
More from Anethoth: builds.anethoth.com — public build dossiers for software projects in progress.