Designing API Resource Locks: When Customers Need to Reserve State Across Requests

Idempotency keys solve the single-request retry problem. They do not solve the multi-request workflow problem where the customer needs to inspect a resource, make a decision, and then mutate it without another caller intervening between the steps. Resource locks are the API primitive for that p

Idempotency keys are the right answer to the question of whether a retried POST will double-charge the customer. They are the wrong answer to a different question: how does a customer perform a read-then-write workflow against a shared resource without another caller invalidating the read between the steps?

The standard answer in shared-database systems is row-level locking via SELECT FOR UPDATE—the caller holds a transaction open, reads the row with a lock, makes a decision, writes the row, and commits. The lock is automatically released. This pattern does not transfer to APIs because HTTP is request-scoped: there is no persistent transaction across requests. The API equivalent has to be invented at the protocol level.

What resource locks actually do

A resource lock at the API level is a server-side reservation that prevents other callers from mutating a specific resource until the lock is released or expires. The mechanism is straightforward: the caller requests a lock, gets back a lock token, performs the read-then-write workflow, and releases the lock when done. Other callers attempting to acquire the same lock get a 423 Locked or 409 Conflict response until the lock is released or expires.

The pattern shows up across a small number of API surfaces. AWS S3 Object Lock is the classic case for object-level write protection. Stripe payment_intents have a confirm-with-pending-state pattern that is structurally similar. GitHub Actions concurrency groups serialize workflow runs against a named lock. None of these use the same exact API shape, but the underlying problem they solve is the same.

When resource locks are the right answer

The pattern fits four cases. The first is conditional writes where the condition cannot be expressed as a single HTTP request—when the decision logic is too complex to fit in an If-Match header. A customer wants to add a row to a budget only if the budget would not exceed a limit after the addition; the limit calculation involves multiple resources; the work cannot be a single API call.

The second is multi-step transactions where intermediate state is visible to the customer. A wizard-style flow where the customer fills out four screens and the API needs to hold reservations during the wizard process. Without locks, another caller can grab the reserved capacity between screens, leaving the wizard customer with a failed submission at the end.

The third is batch operations against shared aggregates where atomicity matters. The customer wants to update twelve rows in a way that is consistent—either all twelve succeed or none does—and the consistency cannot be enforced by a single bulk endpoint because the rows are in different resources.

The fourth is integration with external systems that have their own concurrency requirements. The customer is moving inventory between an Anethoth product and a third-party system, and the third-party system needs serialized access to inventory rows. The lock on the Anethoth side coordinates the cross-system workflow.

The minimum viable lock API

The minimum surface is three endpoints: POST to acquire a lock, DELETE to release it, and an implicit interaction with the resource's normal write endpoints requiring the lock token. The acquire endpoint takes a resource identifier and an optional TTL, and returns a lock token. The release endpoint takes the lock token and releases the lock. The write endpoints on the resource accept an optional X-Lock-Token header and refuse the write if the resource is locked and the header does not match.

POST /v1/flags/feature-x/lock
{ "ttl_seconds": 60 }
=> 201 { "lock_token": "lock_abc123", "expires_at": "2026-..." }

PATCH /v1/flags/feature-x
X-Lock-Token: lock_abc123
{ "rollout_percentage": 50 }
=> 200 { ... }

DELETE /v1/flags/feature-x/lock
X-Lock-Token: lock_abc123
=> 204

The TTL is non-optional in the schema and bounded in the implementation. Customers cannot acquire infinite locks because that would mean any abandoned lock holds the resource forever. A typical bound is 5 minutes default and 30 minutes maximum, with the server free to refuse acquisitions with TTLs above the maximum.

What lock acquisition should and should not block

The acquisition endpoint should fail fast when the resource is already locked rather than blocking. Returning 423 Locked or 409 Conflict immediately lets the caller decide how to retry. Blocking acquisition on the server side requires holding a long-lived HTTP connection and creates its own queuing problems.

The exception is when the caller explicitly opts into blocking acquisition via a query parameter or header, with a server-side maximum wait time. The pattern is similar to a SELECT FOR UPDATE with NOWAIT vs. SELECT FOR UPDATE with a lock_timeout—the default is fail-fast, but the caller can opt into bounded waiting if the workflow tolerates it.

Lock expiration and renewal

Locks expire automatically at the TTL. The expired lock is not retained as state; the resource becomes free to acquire. This prevents the abandoned-lock problem where a crashed caller leaves the resource locked indefinitely.

Renewal extends the TTL of an existing lock. The caller sends a renewal request with the lock token and a new TTL. The renewal succeeds if the lock is still valid and the caller still holds it; the renewal fails with 410 Gone if the lock has already expired. Renewal is what makes the bounded TTL workable—a workflow that needs more than the default TTL can renew during execution rather than acquiring with a large initial TTL.

The wrong patterns

Three patterns fail in production. The first is locks without TTLs—any system where a crashed caller can hold a lock forever will eventually have stuck locks blocking customer workflows. The TTL is non-negotiable.

The second is fine-grained locking on every resource by default. Locks are a power tool that adds operational complexity and customer-side bug surface. Locks should be opt-in for the resources that genuinely need them, with the default being no lock and standard If-Match optimistic concurrency for resources that need lighter coordination.

The third is locks that block reads. A lock should only block writes—reads should always succeed without requiring lock acquisition. This matters for monitoring, debugging, and dashboard use cases that need to inspect resource state without taking a lock.

The audit trail question

Lock acquisitions, renewals, and releases all belong in the audit log. The actor that acquired the lock, the time of acquisition, the time of release (or expiration), and the writes performed while the lock was held are the data that customer support and security investigations actually need.

The expired-lock case is particularly important to audit because it indicates either a crashed caller or a too-short TTL. Pattern analysis on expired locks helps identify TTL tuning needs and customer-side reliability issues. The audit data also catches the rare case where one customer is holding locks long enough to interfere with other workflows.

Across our four products

None of our four products currently exposes a lock API, but the design space is mapped. FlagBit is the strongest candidate—the rollout percentage and targeting rule fields are exactly the kind of resource where a customer might want to compute a complex update outside the API and then apply it atomically. WebhookVault could use locks for the webhook endpoint configuration mutation flow, though If-Match optimistic concurrency is sufficient for most use cases. DocuMint and CronPing do not have natural lock candidates because the resources are mostly write-once or independent.

The deeper observation is that resource locks are a power tool for the small number of API surfaces where optimistic concurrency is insufficient. Most APIs do not need them and adding them speculatively adds operational complexity without benefit. The right discipline is to identify the actual workflow that needs them, design the lock API around that workflow, and resist adding locks to resources that do not need them. When the workflow exists and locks fit, the alternative is some combination of customer-side retry logic, cross-API workarounds, and inevitable race conditions that produce hard-to-reproduce customer bugs.


Our products: DocuMint (PDF invoice generation API), CronPing (cron job monitoring with status pages), FlagBit (feature flags API for modern teams), and WebhookVault (webhook capture and replay) put these patterns into production.

Read more