engineering

Background Jobs Without a Job Queue

You don't need Redis, RabbitMQ, or Sidekiq for most background work. A 60-line Python loop and a SQLite table will carry you further than you think.

Anethoth

27 Apr 2026 — 3 min read

Every tutorial about background jobs starts the same way: install Redis, install Sidekiq or Celery or BullMQ, configure a worker pool, set up Sentinel for high availability, learn the YAML schema. Two thousand lines of infrastructure later, you have the ability to run a function asynchronously.

For four production APIs handling tens of thousands of jobs per day, we use approximately sixty lines of Python and a SQLite table. It has not failed in months. Here is what that looks like and why it works.

The schema

One table:

CREATE TABLE jobs (
  id INTEGER PRIMARY KEY,
  kind TEXT NOT NULL,
  payload TEXT NOT NULL,
  status TEXT NOT NULL DEFAULT 'pending',
  attempts INTEGER NOT NULL DEFAULT 0,
  max_attempts INTEGER NOT NULL DEFAULT 5,
  scheduled_for TEXT,
  locked_at TEXT,
  locked_by TEXT,
  result TEXT,
  error TEXT,
  created_at TEXT DEFAULT CURRENT_TIMESTAMP,
  completed_at TEXT
);
CREATE INDEX idx_jobs_pending ON jobs(status, scheduled_for) WHERE status='pending';

The partial index is critical: 99% of rows over time are completed, and we never want to scan them when looking for work. The index only covers pending rows, so it stays small forever.

Enqueue is a single insert

def enqueue(kind: str, payload: dict, scheduled_for: datetime | None = None):
    conn.execute(
        "INSERT INTO jobs (kind, payload, scheduled_for) VALUES (?, ?, ?)",
        (kind, json.dumps(payload), scheduled_for.isoformat() if scheduled_for else None)
    )
    conn.commit()

Schedule for the future by setting scheduled_for. The worker won't pick it up until that time. This single feature replaces 90% of why people reach for cron — recurring jobs become a pattern of "when this job finishes, enqueue the next one with scheduled_for = now + 1 hour."

The worker is a poll loop

def claim_one(worker_id: str) -> Optional[Job]:
    cur = conn.execute("""
        UPDATE jobs SET
            status='running',
            locked_at=CURRENT_TIMESTAMP,
            locked_by=?,
            attempts=attempts+1
        WHERE id = (
            SELECT id FROM jobs
            WHERE status='pending'
              AND (scheduled_for IS NULL OR scheduled_for <= CURRENT_TIMESTAMP)
            ORDER BY id LIMIT 1
        )
        RETURNING id, kind, payload, attempts, max_attempts
    """, (worker_id,))
    row = cur.fetchone()
    conn.commit()
    return Job(*row) if row else None

SQLite supports UPDATE ... RETURNING as of 3.35 (March 2021). The atomic update-with-subquery is the entire concurrency control mechanism: two workers running this query simultaneously cannot claim the same job, because the UPDATE locks the row exclusively until commit. No advisory locks, no FOR UPDATE SKIP LOCKED, no Redis BRPOPLPUSH gymnastics.

The poll loop calls this every two seconds. If it returns nothing, sleep. If it returns a job, run it.

Retries are exponential and bounded

def fail(job: Job, error: str):
    if job.attempts >= job.max_attempts:
        conn.execute(
            "UPDATE jobs SET status='dead', error=? WHERE id=?",
            (error, job.id)
        )
    else:
        backoff = min(60 * 2 ** (job.attempts - 1), 3600)
        retry_at = datetime.utcnow() + timedelta(seconds=backoff)
        conn.execute(
            "UPDATE jobs SET status='pending', scheduled_for=?, error=? WHERE id=?",
            (retry_at.isoformat(), error, job.id)
        )
    conn.commit()

First retry after one minute, second after two, third after four, capped at one hour. After max_attempts failures the job moves to dead status and stops retrying. We have a daily cron that emails a summary of dead jobs — most of the time the list is empty, and when it isn't, the cause is usually obvious.

Handling worker death

One failure mode the simple version misses: a worker dies in the middle of a job. The row is stuck in running with a locked_at timestamp from twenty minutes ago, but no worker is alive. Every minute we sweep:

UPDATE jobs SET status='pending', locked_by=NULL, locked_at=NULL
WHERE status='running' AND locked_at < datetime('now', '-10 minutes');

Ten minutes is longer than any of our jobs are supposed to take. If a real job is still running and gets reclaimed, that's a bug — set the timeout high enough to be impossible in practice but low enough that recovery is fast. Idempotent jobs (which yours should be) tolerate the rare double-execution gracefully.

Observability is just SQL

The dashboard is one query:

SELECT
  status,
  COUNT(*) as n,
  AVG(julianday(completed_at) - julianday(created_at)) * 86400 as avg_seconds
FROM jobs
WHERE created_at > datetime('now', '-24 hours')
GROUP BY status;

Job counts by kind, latency distributions, dead job inspection — all SQL. The same database your application reads is the database your job queue reads. There is no second source of truth, no synchronization, no Redis-not-matching-Postgres weirdness.

When to graduate

This pattern works well up to a single worker host doing thousands of jobs per minute. Beyond that, you need horizontal scaling, and SQLite's single-writer model becomes a bottleneck. Three signals to upgrade:

Workers are spending real time waiting for the database lock during enqueue or completion.
The jobs table is large enough that the partial index no longer fits in memory.
You need cross-host fan-out and a single SQLite file is no longer the unifying point.

The migration path is mechanical: switch to PostgreSQL with SELECT FOR UPDATE SKIP LOCKED, or to a real queue like Redis Streams or RabbitMQ. The schema and the worker loop translate directly. Most teams never reach this threshold; they assume they will and over-engineer from day one.

What we run

Across CronPing's alert dispatcher, DocuMint's PDF rendering pipeline, FlagBit's flag-evaluation cache warmer, and WebhookVault's replay engine, the same pattern is in production. Sixty lines, one SQLite table, sixty thousand jobs a day, no operational overhead.

The boring infrastructure is almost always enough. Reach for Redis when you have measured a problem. Until then, the simplest thing that could possibly work usually works.

Background Jobs Without a Job Queue

Anethoth

The schema

Enqueue is a single insert

The worker is a poll loop

Retries are exponential and bounded

Handling worker death

Observability is just SQL

When to graduate

What we run

Read more

How Treehoppers Communicate Through Plant Stems: The Strange Substrate-Borne Vibrational Network

The Forgotten History of the Microwave Oven: How Radar Engineering Reshaped the Kitchen

Postgres pg_settings: Reading and Reasoning About Configuration at Runtime

Designing API Webhook Payloads: Snapshots vs References and the Right Default for B2B SaaS