Memory Leaks in Long-Running Python Services: A Diagnostic Field Guide

Python is a garbage-collected language, which is supposed to mean that memory leaks are someone else's problem. In a long-running service, that promise is technically true and practically misleading. The reference-counting collector reclaims memory when reference counts drop to zero, the cycle detector handles cycles between Python objects, and most code does in fact release memory as expected. But long-running Python services do leak, and the leaks compound until the OOM killer arrives.

The leaks have a small set of typical causes. Knowing them turns a multi-day investigation into a multi-hour one.

What a Python memory leak actually looks like

The signature is monotonic resident set size growth over time, with no corresponding increase in workload. A FastAPI service running at 50 requests per second sits at 200 MB after deployment, climbs to 400 MB by the end of day one, and to 800 MB by the end of day three. The growth is steady. There is no sudden spike that maps to a particular event. The shape of the curve looks like the slow accumulation of unreleased objects.

This is different from a workload-driven memory increase, where memory grows when traffic grows and shrinks when it falls. The signature of a leak is that memory grows even when the service is idle. The mental model is: something is being allocated on every request, but only some fraction is being released, so the unreleased fraction accumulates.

The five common causes

The first cause, and by far the most common, is unbounded caches. Someone added a dictionary at module level keyed by some user-supplied value, intending it as a cache. The cache has no eviction policy. Every unique key adds an entry. After enough unique keys, the cache holds gigabytes. The fix is to use functools.lru_cache with a maxsize, or cachetools.TTLCache, or any of the dozens of bounded cache implementations. The discipline is that any cache at module scope must have a defined maximum size.

The second cause is reference cycles holding objects with C-level resources. The Python garbage collector handles cycles between pure Python objects, but objects with __del__ methods can be uncollectable, and objects that wrap C resources (file handles, sockets, native buffers) may hold those resources until the cycle is broken. The fix is to avoid __del__ in favor of context managers, and to use weakref for back-references that should not keep objects alive.

The third cause is logging or telemetry that retains references to large objects. Some logging libraries, when configured with certain handlers, queue records and never let them go. Some APM agents add metadata to every request that includes references to request bodies. The signature is memory that grows in proportion to request volume, even after the request is complete. The fix is to configure log handlers to drop records when overloaded, and to audit APM/tracing setup for request-scoped retention.

The fourth cause is global state that grows. The most common form is a session registry, request log, or user table held in memory and never trimmed. The signature is memory that grows in proportion to the number of distinct users or sessions. The fix is to bound the structure or move it to a database.

The fifth cause, increasingly common in 2026, is C extension leaks. Python libraries with native extensions (numpy, pandas, certain crypto libraries, image processing) can leak memory in their C code. The leak is invisible to Python tooling because the memory is allocated outside the Python heap. The signature is RSS growth without growth in Python object count. The fix is to upgrade the library, or to find an alternative.

Diagnostic tools, in order of usefulness

The first tool is tracemalloc from the standard library. Enable it at startup, take a snapshot, run the service for an hour, take another snapshot, compare. The diff shows which lines of Python code allocated the most net new memory. This is the highest-leverage tool because it requires no external dependencies and works in any Python service.

The second tool is objgraph for visualizing reference graphs. When tracemalloc identifies a class that is accumulating instances, objgraph shows what is keeping them alive. The output is a graph that reveals the path from a GC root to the leaking object. Once you see that path, the fix is usually obvious.

The third tool is memray, a more sophisticated memory profiler from Bloomberg. It tracks every allocation, including in C extensions, and produces flame graphs of allocation sources. Heavy-weight, but worth running when tracemalloc is not finding the leak. The C-extension visibility is what makes it irreplaceable for the fifth cause above.

The fourth tool is the gc module's debug flags. gc.set_debug(gc.DEBUG_SAVEALL) retains objects that the cycle detector finds, allowing post-hoc analysis of what is being collected and why. Useful for confirming that the cycle detector is doing its job, and for finding objects that should be collected but are not.

The fifth tool is operating-system memory profilers (smaps_rollup on Linux, vmmap on macOS) for the case where the leak is below the Python layer entirely. When tracemalloc shows stable Python heap and RSS keeps growing, the allocator or a C extension is the source.

The discipline of bounded growth

The leak-resistant pattern, applied at design time, is that every long-lived data structure has a defined maximum size. Caches use bounded implementations. Queues use bounded implementations. Connection pools have a max. Worker pools have a max. The number of objects of any given type is bounded above by something other than "however many requests we have served."

This discipline rules out a class of leak before it can happen. The structures that grow without bound do not exist in the codebase, because every data structure has a sizing decision baked in.

The pattern also applies to logs and traces. The amount of buffered telemetry is bounded by the buffer configuration, not by request volume. The fallback when the buffer fills is to drop, not to grow. A monitoring system that brings the application down when monitoring is overloaded is a worse failure mode than missing some traces.

The infrastructure complement

The application discipline is paired with infrastructure that catches the cases where the discipline fails. Container memory limits, process restarts on memory thresholds, and OOM killer dry-runs in staging environments are the safety net.

The pattern that scales: every service runs with a hard memory limit set in the orchestrator. When the limit is exceeded, the service restarts. The restart is fast (under 10 seconds for a typical Python service), and the load balancer routes around it. A leak does not bring the service down; it triggers a restart that the user does not see.

This is what we run across DocuMint, CronPing, FlagBit, and WebhookVault. Each container has a memory limit. If memory growth ever became a problem, the restart would mask it long enough to investigate. So far we have not had to.

The deeper point is that memory leaks in Python are a managed risk, not an eliminated one. The garbage collector covers most cases. The bounded-growth discipline covers most of the rest. The infrastructure safety net covers what slips through. The combination is reliable enough to run a long-running service for months without intervention.