Background Job Priorities: Multi-Queue Patterns That Do Not Cause Starvation
Job queues with priorities sound straightforward. The naive implementation works fine until a flood of high-priority jobs causes low-priority jobs to wait indefinitely. The patterns that survive production traffic share a structural property: priority changes which queue runs next, not whi
Most job queues start with a single priority level. Jobs go in, workers pull jobs out in FIFO order, the system stays simple. The first request for priority comes when a small number of jobs are time-sensitive (send the password reset email now) and most are not (rebuild the search index). The naive implementation is to add a priority column, sort the queue by priority on each pull, and let the highest-priority jobs run first. This works in steady state and breaks under any flood that puts more high-priority work on the queue than the workers can drain. Low-priority jobs sit and wait, sometimes for hours, sometimes forever. The technical name for the failure is starvation, and the patterns that prevent it are worth understanding before the failure happens.
This post covers the four-pattern continuum from single-queue to multi-queue with weighted scheduling, the failure modes each pattern addresses, and the operational discipline that keeps any of them working over time. The patterns apply across the four products in our studio — DocuMint, CronPing, FlagBit, and WebhookVault — and are general enough to apply to any worker pool that processes asynchronous work.
Pattern one: single queue, FIFO
The starting pattern is one queue, one shared worker pool, jobs processed in arrival order. There is no priority, no class, no segregation. The strength of the pattern is that it cannot starve any job: every job that enters the queue eventually runs, and the wait time is bounded by the depth of the queue ahead of it.
The weakness is that one slow job class blocks every other class. If WebhookVault is processing a flood of slow webhook deliveries to a slow customer endpoint, fast webhook deliveries to fast endpoints have to wait their turn. The head-of-line blocking is the failure mode that motivates everything that comes next.
Pattern two: priority within a single queue
The first instinct is to add a priority column and pull the highest-priority job. The implementation is a single query — SELECT * FROM jobs WHERE status='pending' ORDER BY priority DESC, created_at ASC LIMIT 1 FOR UPDATE SKIP LOCKED — and it works fine in steady state. The problem is that under load, high-priority jobs can arrive faster than they drain, and low-priority jobs never get pulled. The behavior is correct by the priority order; the problem is that "correct by priority" is not the same as "every job eventually runs."
The mitigation that sometimes appears is age-based priority boost: a job that has been waiting longer than some threshold gets its priority bumped up. This works in toy examples and breaks in production because the threshold is hard to set correctly and the boosting itself adds load to the queue scheduling.
Pattern three: separate queues with separate worker pools
The structural fix is to have multiple queues with separate worker pools, one per priority class. High-priority jobs go to the high-priority queue, served by a dedicated pool of workers; low-priority jobs go to the low-priority queue, served by a different pool. Neither queue can starve the other because they have independent worker capacity.
The trade-off is that worker capacity is now allocated, not shared. A burst of low-priority work cannot use idle high-priority workers, and a quiet period on the high-priority queue means those workers sit idle while low-priority work piles up. The total system throughput is lower than with shared pools, but the predictability is much higher.
This pattern is right when the priority classes are operationally distinct and you want to be sure that work in one class never delays work in another. It is the right pattern for our hypothetical DocuMint case where synchronous PDF generation should never be delayed by batch PDF generation, even if the batch queue is empty.
Pattern four: shared workers with weighted scheduling
The richer pattern keeps the multiple queues but uses a shared worker pool with weighted scheduling: workers pick from queues with a probability or rotation that favors high-priority queues without ever letting low-priority queues sit idle indefinitely. A common scheme is to pick from the high-priority queue 80% of the time and the low-priority queue 20% of the time, with the percentages tuned to the workload. Another scheme is round-robin with weights: serve four high-priority jobs, then one low-priority job, then four high-priority jobs.
The strength is that workers are shared, so total throughput stays high; the weakness is that the scheduling logic now has to live in the workers and be tuned over time as workloads shift. The Linux kernel's CFS scheduler is the canonical example of a weighted scheduler that has been refined for decades; production job schedulers are usually much simpler.
The right place to start is a two-level pattern: separate queues per priority class with a shared worker pool that picks via weighted round-robin. The weights are tuned to the observed traffic. The implementation is a few hundred lines of code that an engineering team can own.
The failure modes that show up regardless of pattern
Three failure modes show up regardless of which pattern you pick. The first is the slow-job problem: a single job that takes ten minutes blocks the worker that picked it for those ten minutes, regardless of priority. The fix is timeouts that are short relative to the queue depth budget and visibility into per-job duration that surfaces the slow jobs before they accumulate.
The second is the poison-pill problem: a job that crashes the worker on every retry occupies a worker indefinitely. The fix is a max-attempts policy that moves the job to a dead-letter queue after some number of failures, and a dead-letter queue that humans review.
The third is the priority-inflation problem: every team that owns a queue eventually wants its work classified as "high priority," and the system drifts toward all jobs being high-priority over time. The mitigation is governance — the priority classes should be defined operationally with explicit criteria, and the criteria should be enforced through code review or queue acceptance.
The five operational signals
The signals that tell you the queue is healthy are: per-queue depth (alert when above some threshold for a sustained period); per-queue oldest pending age (alert when above the queue's SLA); per-class p99 wait time (alert on regression); per-class success rate (alert on dropped success rate); and per-class throughput (alert on unexpected drops). The first two are leading indicators; the last three are lagging indicators that confirm or refute the leading-indicator alerts.
The deeper observation
The underlying lesson is that priority is mostly an argument about resource allocation under contention, and the architecture that handles priority well is the architecture that allocates resources explicitly rather than implicitly. The naive single-queue-with-priority pattern allocates implicitly and breaks when the implicit allocation does not match reality. The multi-queue-with-shared-workers pattern makes the allocation explicit and tunable, which is what you want when the workload changes over time. CronPing uses a single-queue pattern because monitoring jobs are uniform; the more complex products would benefit from the multi-queue approach as their workloads diverge.