The Hidden Cost of Logging: When Your Observability Stack Becomes Your Bottleneck
Logging looks free until it isn't. The performance cost shows up at scale, the storage cost compounds with retention, and the cognitive cost of bad logging is the worst of all. Most teams underinvest in deciding what to log and overinvest in tooling for the noise.
Logging is the kind of infrastructure that looks free in development and turns into a meaningful cost center in production. A small SaaS that emits 100 log lines per request and serves a million requests per day generates 100 million log lines daily. At 200 bytes per line, that's 20GB. At 30 days retention, that's 600GB. At a typical log ingest cost of around five cents per gigabyte plus storage, the monthly bill on a managed observability platform passes a thousand dollars before any real scale. The teams that hit this wall didn't make a deliberate decision to log everything; they made a series of small decisions that each looked harmless and the volume compounded.
The patterns in this post matter for any system at the scale where logging cost has become noticeable. We tune logging deliberately on DocuMint, CronPing, FlagBit, and WebhookVault because the alternative is paying more for log storage than for the rest of the infrastructure combined.
The performance cost
The naive view of logging is that it's a cheap operation: format a string, write to stderr, move on. The reality is that production log calls do significantly more work. A structured log entry typically involves serializing context dictionaries, formatting timestamps, evaluating log level filters, possibly invoking custom processors, and writing to a destination that may be the local disk, a pipe to a log shipper, or a network socket.
For a Python service using the standard logging library with a JSON formatter and a few processors (request ID, user ID, timestamp), each log call takes 50-200 microseconds. That sounds negligible until you note that a typical request might emit 5-10 log lines, adding 0.5-2ms of latency that wouldn't otherwise exist. For a service whose p99 latency target is 50ms, logging is now consuming 1-4% of the latency budget on every request, which is a meaningful chunk.
The cost compounds when log calls are inside hot paths. A naive logger.debug() call inside a per-row loop in a database query can add seconds to a batch operation, even when the debug level is filtered out at the destination, because the formatter still has to evaluate the message string. The standard mitigation is the conditional pattern — checking if logger.isEnabledFor(DEBUG) before constructing the log message — but it's tedious to apply consistently.
The deeper issue is synchronous flush. Some logging configurations flush every line to disk immediately for durability. This is the right behavior for audit-critical logs but turns logging into a synchronous I/O operation that adds milliseconds of latency. Most production logging should be asynchronous, with batched writes — and that's a configuration, not a default in most stacks.
The storage cost
Storage costs for logs are linear in volume and retention. The volume part is obvious; the retention part is where the surprise is. Most teams set retention based on a hand-wavy intuition about how long they might want to look at old logs. The result is retention values that are either too short (the one time you need to investigate something from six weeks ago, the data is gone) or too long (you're paying to store logs that nobody has touched in months).
The more disciplined approach is to categorize logs by audience and retain each category according to its actual access pattern. Operational logs (request logs, error logs, performance counters) are accessed within hours or days of the event; 14-day retention is usually sufficient for most operational debugging. Security logs (authentication, authorization, audit) might need 90-day retention or longer for compliance reasons. Application logs (business events, user actions) might need long retention for product analytics, but those should usually be in a structured event store rather than in the log stream.
The tiered storage pattern handles this naturally. Hot tier (the last 7-14 days) lives in fast queryable storage — whatever your log platform offers. Warm tier (15-90 days) lives in slower storage that's still queryable but cheaper. Cold tier (90+ days for compliance) lives in object storage like S3 with longer retrieval times. The cost ratio between tiers is typically 10x to 100x, so moving older logs out of the hot tier produces immediate savings.
The cognitive cost
The largest cost of logging isn't measured in dollars or microseconds; it's measured in engineering attention during incidents. A log stream that emits 10000 lines per minute during normal operation is unsearchable during an incident — by the time an engineer has filtered down to the relevant subset, minutes have passed and the situation has changed.
The solution is logging discipline at write-time, not at query-time. The right level for routine operations is INFO at a reasonable cadence (one or two lines per request, not ten), DEBUG for detailed traces that are off in production, and WARNING/ERROR for genuinely abnormal events. The wrong level is INFO for everything because "we might want to know" — that produces volume that masks the actual signals.
Structured logging helps with the cognitive cost when it's done well and hurts when it's done badly. Done well, structured logging means consistent field names across services, consistent field types, and queryable structured data. Done badly, structured logging means every developer adds the fields they think are interesting at the moment, the field names diverge across services, and the structured data is more verbose than free-text logs without being more queryable. The discipline of a small fixed schema, applied across all services, is the difference.
The five fields that matter
Most log lines need only five fields to be useful: timestamp, severity, service identifier, request or trace ID, and the message. Everything else should be in structured context attached to the message. The minimum viable schema:
{
"ts": "2026-05-10T10:00:00.123Z",
"level": "INFO",
"service": "documint-api",
"trace_id": "abc123",
"msg": "invoice.generated",
"context": { "invoice_id": 42, "duration_ms": 35 }
}The field discipline is more important than the specific schema. Every log line has every field. The trace_id field is the single highest-leverage primitive for connecting log entries across services and connecting logs to traces. The msg field is the event name, not the human-readable description — make it stable and grep-friendly.
What not to log
The flip side of "log thoughtfully" is "don't log carelessly." The list of things that should not be in logs is short but consequential: passwords, API keys, session tokens, credit card numbers, anything covered by GDPR personal data restrictions, full request bodies that might contain user-uploaded content, full response bodies that might contain user data. The most common violation is logging full HTTP request payloads "for debugging," which means PII flows into the log pipeline and is now subject to the deletion obligations of GDPR or CCPA without anyone noticing.
The standard mitigation is field-level redaction at the logging layer. Pydantic's SecretStr, structlog processors, and similar mechanisms can be configured to recognize sensitive fields and replace their values with [REDACTED] before the log line is serialized. This needs to be applied consistently — partial redaction is worse than no redaction because it gives a false sense of safety.
Sampling for high-volume services
At a certain scale, the right answer is to log fewer events. A service that handles 10000 requests per second cannot reasonably log every request at INFO level — the volume is unmanageable and the per-request value is low. The pattern is sampling: log every nth successful request, log every error, log every slow request. The headline metric (request rate, error rate, latency distribution) goes to a metrics system; the per-request log goes to a sampled log stream.
Adaptive sampling is an extension of this: when the error rate goes up, the sampling rate goes up to give more visibility. When everything is normal, the sampling rate goes back down. Most modern observability platforms support this in some form.
The five operational signals
The monitoring panel for the logging system itself is small but useful. Log volume per service, with alerts on order-of-magnitude jumps that usually indicate a runaway loop. Log ingest lag, which catches a stuck pipeline. Log error rate, separated from other error rates because logs about errors aren't the same as errors about logging. Storage growth rate vs retention, which catches misconfigured retention. Log query rate, which catches teams that don't know about the logs they're paying to store.
The deeper observation
Logs are a substitute for thinking about what should be observable. The lazy version of observability is to log everything and search later. The disciplined version is to identify the events that matter, instrument those, and let the rest of the operational visibility come from metrics and traces. The discipline pays off more than the tooling: a service with thoughtful 5-line-per-request logs and a metrics dashboard is more debuggable than a service with 50-line-per-request logs and no metrics. The cost of the lazy version is paid mostly by the people who weren't on the team when the logs were instrumented and who now have to make sense of the noise during an incident at 3 AM. They will remember.