Designing API Health Check Endpoints: Beyond the 200 OK

A health check endpoint that returns 200 for everything is a load balancer's favorite and an operator's worst friend. The distinction between liveness, readiness, and dependency health is what makes the difference under real production conditions.

Every production service has a health check endpoint. Most of them do roughly the same thing: return 200 OK if the process is responsive, return something else (or nothing) if the process is dead. This is enough to satisfy load balancers and basic uptime monitors, and it is enough to mislead operators during real outages.

The problem is that the same health check has to answer three different questions for three different audiences. Load balancers ask "should I send traffic to this instance right now?" Orchestrators ask "should I restart this instance?" Dashboards ask "is anything wrong that a human should know about?" The right answers to these three questions are not the same, and conflating them into a single 200-or-not endpoint produces failure modes in all three.

The three-question framework

The Kubernetes liveness/readiness/startup probe split is one workable answer, and it generalizes beyond Kubernetes. Liveness asks whether the process is responsive at all: can it accept a TCP connection, parse an HTTP request, and produce a response within a few hundred milliseconds. Failure means the process is wedged and needs to be killed and restarted. Readiness asks whether the process is currently capable of handling production traffic: are its dependencies reachable, are its caches warm, has startup completed, is shutdown not in progress. Failure means temporarily route traffic elsewhere. Startup asks whether initial setup has completed: have migrations run, have warm caches been populated, are connection pools established. Failure means keep waiting before testing liveness or readiness.

Most services need at least liveness and readiness as separate endpoints with different semantics. The startup probe is most useful when warm-up time is long enough that an aggressive liveness probe would kill the process before it finished starting. For services with sub-second startup, startup probes are optional.

What liveness should check

The liveness check should be the cheapest possible affirmation of "this process is alive and responding." A typical implementation: a static HTTP handler that returns 200 with a one-line body, no database query, no dependency check, no work. The reason to keep it cheap is that it is called frequently (every few seconds) and any work in the liveness handler is work that runs constantly.

The reason to keep it isolated from dependencies is that liveness failures mean "kill and restart." If the liveness check fails because a database is briefly slow, the orchestrator restarts the process, which makes the database problem worse (more connections, more re-initialization) without fixing anything. The mantra: liveness failure is a process-level problem, not a system-level problem. Anything that an unrelated component can break does not belong in liveness.

What readiness should check

The readiness check is where dependency status belongs. A reasonable implementation: confirm database connectivity (with a fast query, not a slow one), confirm critical downstream service availability, confirm the process is not in graceful shutdown, confirm initial loading is complete. The check should fail if any of these are not true and should succeed otherwise.

The discipline that often gets missed: a readiness check should be aggressive about failing. The cost of failing readiness is that the load balancer routes traffic to other instances; the benefit is that production users do not see errors that would have happened on this instance. The cost is mild and reversible; the benefit is real and customer-facing. Erring on the side of failing readiness when dependencies look shaky is usually the right call.

The check should run on a sample of the actual production workload. A readiness check that does SELECT 1 proves only that Postgres is accepting connections, not that the queries the service makes will succeed. A readiness check that does a representative read query proves the access path the service depends on is healthy. The implementation is slightly more expensive but considerably more diagnostic.

What startup should check

The startup probe answers whether initialization completed. The typical contents: migrations have run, the cache layer has been primed, the connection pool has at least one usable connection, configuration has been loaded and validated. Once startup succeeds, it never returns to failure for the lifetime of the process; the probe is a one-way state transition.

The right behavior for liveness and readiness during startup is to fail. Liveness failure during startup is usually wrong (it triggers restart of a process that is making forward progress); the startup probe is the mechanism to suppress liveness checks until the process is far enough along to handle them. Readiness failure during startup is correct (the service is not ready); the startup probe acknowledges this is normal and not a sign of trouble.

The dependency-health detail page

Beyond the three orchestrator-facing probes, most services benefit from a dependency-health endpoint intended for dashboards and operators. This is not on the path of load balancer or orchestrator decisions; it is for humans to see what is going on. The right contents: per-dependency status with last-check time and last-failure time, per-dependency latency or error rate, per-feature status if features can be independently degraded, build version and deploy time.

The endpoint should not be reached by automated systems that take action based on its contents. The reason: the schema will evolve, the dependencies will change, the format will be inconsistent across services. Operators reading the page are tolerant of inconsistency; load balancers are not. Keep the machine-readable endpoint stable and minimal; keep the human-readable endpoint rich and changing.

The graceful shutdown coordination

During shutdown, readiness should fail before the process stops accepting new connections. The sequence: receive SIGTERM, mark readiness as failed, wait for load balancer health-check interval plus a few seconds, then begin draining in-flight requests, then exit. The window between readiness-failing and request-draining is what keeps the load balancer from sending new requests during shutdown.

The reason readiness and liveness are separable here: during the drain window, liveness should still pass because the process is alive and processing existing requests; readiness should fail because new traffic should go elsewhere. A health check that conflates these two will either restart the process during drain (bad) or keep sending it traffic during shutdown (also bad).

What not to do in any of the checks

Expensive operations: heavy database queries, S3 listings, downstream service health probes that themselves do expensive work. These make the health check itself a load source, and a slow health check that blocks other requests is its own failure mode.

Operations that have side effects: writing to a database, creating a log entry, incrementing a metric counter that is also incremented by real work. Health checks should be read-only and idempotent; otherwise the metrics, logs, and database state get distorted by the volume of health-check traffic.

Authentication: health check endpoints should be unauthenticated, because the authentication system itself is a possible failure point. The endpoints should be on private networks or behind a firewall that blocks external access; the authentication burden should be on the network layer, not the application layer.

The signals to monitor

Four metrics that catch most health-check pathologies. First, health check latency: should be under 100ms always; spikes indicate the check is doing work it should not. Second, ratio of liveness failures to restarts: should be roughly 1:1 in normal operation; if liveness frequently fails but the orchestrator does not restart, the check is too sensitive. Third, ratio of readiness failures to actual outages: readiness should fail at the start of dependency issues and recover when they resolve; persistent readiness failure with no operator alerting is a missed page. Fourth, the post-startup latency profile: requests right after startup should not be slower than steady-state by more than a small factor; large slowness indicates the startup probe is succeeding before the process is actually warm.

Across our four products

DocuMint, CronPing, FlagBit, and WebhookVault each have /healthz endpoints that combine liveness and readiness because we run them on Caddy with passive health checking rather than a Kubernetes orchestrator that wants the three-way split. The trade-off works because our services have negligible startup time and few dependencies, so the conflation does not produce the failure modes that would matter in a more complex deployment. The endpoint behavior we have settled on: return 200 with a JSON body listing component status, return 503 if any component is unhealthy, never return 500 because that triggers different alerting paths.

The deeper observation is that the health check endpoint is the API surface most services design last and pay attention to least, despite it being one of the most consequential endpoints in production. The same endpoint affects load balancing, orchestration restart decisions, and human alerting; the same naive implementation can fail all three audiences. Spending an hour thinking about what the three audiences actually need usually pays for itself within the first incident where the right endpoint design would have made a difference.

Read more