Queue Worker Patterns: Single Consumer, Worker Pools, and the Subtleties of Concurrency

The first job queue most teams build is a single consumer reading from a SQLite or Postgres table in a tight polling loop. It works fine until it doesn't. The "doesn't" usually arrives suddenly: a slow job clogs the consumer, jobs back up, the customer-facing failure mode is delayed processing of unrelated work, and the team discovers that what looked like a queue was actually a queue plus a single-threaded executor pretending to be a queue.

This post covers the four patterns that show up at different scales, their concurrency models, the failure modes each pattern hides until production, and the migration path between them.

Pattern 1: Single consumer in a polling loop

The starting pattern is one process polling a jobs table, claiming jobs one at a time, processing them serially, and updating their status on completion. This is the right pattern at very small scale: under a few hundred jobs per hour with sub-second processing time per job, single-consumer is correct because the total work fits comfortably in one core and the operational simplicity is genuine.

The failure mode is the slow job. One job that takes ten minutes blocks every other job behind it for ten minutes, regardless of how cheap the other jobs are. This is the head-of-line blocking pattern that motivates every more complex queue architecture.

For the four Anethoth products, single-consumer is the right pattern in two specific places: CronPing's monitor-due-date checker, which runs sub-millisecond per check and processes a few thousand checks per minute; and FlagBit's evaluation-counter aggregator, which is also sub-millisecond per aggregation. Single-consumer is wrong for DocuMint PDF generation (variable per-job time) and WebhookVault webhook forwarding (network-bound and parallelizable).

Pattern 2: Worker pool with shared queue

The next step is N consumers reading from the same jobs table, each claiming and processing jobs independently. The claim mechanism is the critical primitive: in PostgreSQL, SELECT FOR UPDATE SKIP LOCKED gives lock-free contention; in SQLite, an UPDATE with a returning clause and a status filter approximates the same semantics with a brief write lock. The N is tuned to the per-job concurrency profile: CPU-bound jobs get N = number of cores, network-bound jobs get N much higher.

The failure modes are different. Slow jobs no longer block other jobs because there are N consumers. But N consumers can produce N times the load on downstream dependencies — databases, third-party APIs, file systems — and the queue worker layer becomes the place where downstream rate limits are enforced or the place where downstream dependencies are taken down by load. The common bug is a worker pool sized for happy-path jobs that overwhelms a downstream dependency during a retry storm.

The other failure mode is heterogeneous job times. A pool of ten workers processing a mix of one-second jobs and ten-minute jobs will have all ten workers stuck on ten-minute jobs at exactly the wrong moment. The fix is either separating the job classes into separate pools (next pattern) or limiting per-job time aggressively with timeouts.

Pattern 3: Per-class worker pools with priority

Worker pools per job class — fast vs slow, customer-facing vs background, paid-tier vs free-tier — solve the heterogeneous-job-time problem and let you size the pools differently. The cost is more moving parts: each pool has its own configuration, monitoring, and failure modes; the dispatch logic that puts jobs into the right pool needs to be correct; cross-pool dependencies require explicit coordination.

The pattern that survives at this scale is a small number of named pools — typically three to five — with clear names that match the actual job classes rather than abstract priorities. "PDF generation" and "thumbnail generation" are good pool names; "high priority" and "low priority" are bad pool names because they require the dispatcher to decide what counts as high. The dispatcher's job is mechanical mapping from job type to pool, and the named pools make the mapping obvious.

For DocuMint at meaningful scale, the right pool structure is "synchronous PDF" (small jobs that the user is waiting for, fast pool, low N) and "batch PDF" (the bulk-export and scheduled-generation jobs, slower pool, higher N). Mixing them in a single pool means a batch job blocks a synchronous one at exactly the worst time.

Pattern 4: Distributed queue with per-job parallelism

The fourth pattern adds parallelism within individual jobs: each job is broken into sub-tasks that run in parallel, and the results are joined. This is the right pattern for jobs that are themselves parallelizable — fanout to many recipients, many independent sub-computations, many file uploads — and where the per-job latency matters more than throughput.

The cost is significant: the orchestration layer has to handle partial failure, retries of individual sub-tasks, and the join that brings results back together. This is where dedicated workflow engines like Temporal, Airflow, or Cadence start to earn their keep, because hand-rolling the orchestration is a meaningful amount of code that has to be operationally correct.

For WebhookVault webhook forwarding to many destinations, the per-job parallelism pattern would mean parsing the webhook once and dispatching to all subscribed destinations in parallel rather than serially. The implementation is straightforward when destinations are stateless, but the retry-and-partial-failure surface is where the complexity lives.

The concurrency-control discipline

Across all four patterns, three concurrency-control questions have to have explicit answers: how does a worker claim a job atomically, how does a worker mark a job complete or failed, and how does a worker handle its own death mid-job. The first is the SKIP LOCKED or atomic UPDATE pattern. The second is a transactional write that updates the job row and any side-effect rows in one atomic step. The third is the visibility-timeout or lease pattern: the claim sets an expiration time, and a sweeper reclaims jobs whose lease expired without completion.

The bug that shows up in production is double-processing: a worker claims a job, starts processing, gets paused by GC or container kill, the lease expires, another worker reclaims the job, and the original worker eventually wakes up and completes the work. Both workers report success and the side effect happens twice. The fix is per-job idempotency keys at the side-effect layer, not anything in the queue itself. The queue cannot prevent double-processing under all failure modes; idempotent workers can tolerate it.

The migration path

The escalation from single-consumer to worker pool to per-class pools to distributed orchestration is one-way: each step adds complexity that does not unwind, so the right answer is to stay at the simplest pattern that works for current load and migrate when the failure mode is actually observed. The wrong answer is to build the distributed-orchestration pattern from day one because "we will need it eventually." Most queues never need it; the ones that do can migrate when the time comes; the cost of the complex pattern is paid every day until you do.

The other wrong answer is to delay the migration until the failure mode has caused a customer outage. The signs that suggest you are about to need to migrate are: tail-latency P99 of jobs growing faster than median, head-of-line blocking incidents in the postmortem record, and operational interest in "draining the queue" as a concept. When the operations team starts asking about queue depth, the next pattern is overdue.