HTTP Compression in APIs: Brotli, gzip, and the Trade-offs Nobody Mentions

HTTP compression is one of the highest-leverage performance optimizations available, and one of the easiest to implement badly. The honest design space involves Brotli vs gzip selection, what not to compress, the CPU-versus-bandwidth trade-off, and the webhook-signature problem m...

HTTP compression is the kind of optimization that looks like a one-line config change and is almost never just a one-line config change. Enabling Brotli or gzip on a JSON API can cut response sizes by 70-90% — a real win for clients on slow connections and a real cost reduction in egress bandwidth at scale. It also creates a small set of failure modes that show up only in specific situations: payloads that are already compressed, sub-kilobyte responses where the compression header costs more than it saves, CPU saturation on small instances, and the webhook signature verification problem that nobody mentions until a customer's signature check fails in production.

This is the longer version of "what does compression for an HTTP API actually look like" — the algorithm choice that matters in 2026, the route-level decisions about what to compress, the operational metrics that catch the failure modes, and the signature-verification gotcha that turns into a customer-facing bug.

The Brotli-versus-gzip choice in 2026

gzip is the universal compatibility floor. Every HTTP client made in the last twenty-five years supports it, the CPU cost is well-characterized, and the compression ratios on JSON are reasonable — typically 70-80% reduction on a typical API response. Brotli is the modern win, achieving 75-85% reduction on the same JSON with comparable decompression speed. Brotli compression is slower than gzip at high quality levels, which matters if you compress on every request rather than caching pre-compressed responses.

The right server configuration in 2026 is to negotiate based on the client's Accept-Encoding header, prefer Brotli when the client supports it, fall back to gzip otherwise, and skip compression entirely for clients that do not advertise either. Caddy, nginx, and most modern HTTP servers handle this negotiation natively. The configuration is one or two lines per route. The thinking that goes into it is a paragraph longer.

What not to compress

Compressing already-compressed content is wasted CPU and slightly larger output. PNG, JPEG, MP4, ZIP, and PDF responses should not be re-compressed at the HTTP layer. Most servers infer this from the response Content-Type and skip compression — verify your configuration does this before you ship, because a misconfigured server compressing every response wastes meaningful CPU at scale.

Sub-kilobyte responses are also usually not worth compressing. The compression headers and dictionary overhead can outweigh the saved bytes for very small payloads. The conventional cutoff is 1 KB minimum response size, configurable per route. JSON APIs often have small responses to status checks, ping endpoints, and lightweight CRUD operations — exempting these from compression reduces both server CPU and the chance of exotic compression bugs in obscure clients.

The CPU-versus-bandwidth trade-off

Compression costs CPU on the server. At low traffic this is invisible. At high traffic on small instances, compression can become the bottleneck, especially with Brotli at high quality settings. The trade-off is real: you spend server CPU to save client bandwidth and reduce your own egress bill.

The honest framing is that the CPU cost is constant per request, while the bandwidth savings scale with response size. For API endpoints returning small JSON payloads to high-volume callers, the CPU-to-bandwidth ratio is unfavorable. For endpoints returning large lists or complex objects to lower-volume callers, compression is almost always net-positive. Run the math for your specific traffic profile rather than relying on defaults.

The mitigation that closes the loop is pre-compression: cache compressed responses for cacheable content. The CDN already does this for static assets. For dynamic API responses, an internal compressed-response cache keyed on the response body hash can amortize the compression cost across repeated requests for identical content.

The signature verification problem

Webhook payloads are signed over the raw bytes of the body. If a webhook sender compresses the body and the receiver's compression-decoding happens before signature verification, the signature check fails. If the receiver verifies before decompression but tries to verify the decompressed body, the same. The sender's signature is computed over what they sent on the wire, which depends on whether the sender chose to compress.

The convention that works is: senders sign the uncompressed body and document that the compression layer is transparent to signature verification. Receivers decompress first (their HTTP framework usually does this automatically), then verify the signature against the decompressed body, then parse. This requires the sender to be explicit in their webhook documentation about which body the signature covers. Stripe, GitHub, and other major webhook senders all document this carefully because the alternative is customer-facing bugs that look like signature mismatches.

Across our products, webhook senders sign uncompressed payloads, and our WebhookVault capture-and-replay tool records the decompressed body so customers can verify signatures during integration testing. The compression layer is invisible to the signature pipeline by design.

Negotiation header subtleties

The Accept-Encoding header is supposed to be a comma-separated list of supported encodings with optional quality values. The reality is that some clients send malformed headers, some proxies strip the header, and some intermediaries modify it in ways that confuse content negotiation. The defensive approach is to treat any non-recognizable header as "no compression" rather than guessing.

The Vary: Accept-Encoding response header is critical for cache correctness. CDNs and shared caches must key the cached response by the encoding so they do not return a Brotli-compressed response to a client that only supports gzip. Forgetting this header is one of the classic compression bugs — the symptom is that some clients receive responses they cannot decode.

What to monitor

The five operational signals that tell you compression is healthy: (1) compression ratio per Content-Type, where wide deviations from baseline indicate incorrect compression of binary content; (2) CPU spent on compression as a percentage of total CPU, where above 20% suggests over-compression; (3) per-Content-Encoding error rates, where elevated 5xx on Brotli requests but not gzip indicates a Brotli-specific bug; (4) cache hit rates per encoding, where misses correlate with missing Vary headers; (5) average response size before-and-after compression, sampled per endpoint, to catch degenerate cases where compression makes responses larger.

None of these require fancy observability infrastructure. Caddy, nginx, and the application servers expose compression metrics natively or through one-line plugin configuration. The discipline is checking them periodically rather than waiting for a complaint.

Where this matters across our products

All four of our products serve JSON responses with Brotli-and-gzip negotiation enabled at the Caddy layer. DocuMint serves PDF responses with no additional compression because PDFs are already compressed. CronPing and FlagBit have JSON-heavy APIs where compression saves significant egress bandwidth. WebhookVault records request bodies in their original encoded form for replay fidelity, but the dashboard surfaces a decompressed preview for inspection. Each product makes the compression choice at the route level rather than relying on a global default that does the wrong thing somewhere.

The deeper observation, which applies beyond compression specifically, is that the easy-looking optimizations almost always have a small set of edge cases that the marketing material does not mention. Compression is a real win when it is configured deliberately. It is also a real source of confusing production bugs when it is enabled by default and forgotten about.

Read more