Time-Series Data in SQLite: Patterns Before Reaching for TimescaleDB

Time-series data is the canonical case for reaching for a specialized database. Metrics, events, traces, logs, monitor pings, webhook deliveries, audit records — every product accumulates rows ordered by time, and the conventional wisdom is that handling them at scale requires TimescaleDB, InfluxDB, ClickHouse, or some other purpose-built engine. The conventional wisdom is right at scale. It is also wrong about where the scale threshold actually sits.

SQLite, with the right patterns, comfortably handles billions of rows of time-series data with sub-millisecond query latency. Most products that reach for a specialized time-series database would have been better served learning four specific SQLite patterns first. This is the case for those patterns and a description of what each one buys.

The shape of the data, and why it matters

Time-series data has a particular shape: rows are inserted in approximately time-order, queries are almost always bounded by a time range, and aggregation by time bucket (hour, day, week) is the dominant analytical pattern. The optimizations specialized databases make are tuned to this shape. The good news is that SQLite can be tuned to the same shape with a few schema and indexing decisions.

The bad news is that if you treat time-series data as ordinary relational data — a generic events table with a primary key and indexes on whatever columns the queries happened to need — SQLite will work but slowly, and the slowness will read as evidence that you need a different database. The slowness is actually evidence that you need different patterns.

Pattern 1: Composite primary key with time first

The single most important schema decision is to make the primary key start with the timestamp column, with a tiebreaker for uniqueness. Concretely: PRIMARY KEY (timestamp, id) where id is a per-row UUID or sequence. This causes SQLite to physically order rows by timestamp on disk via the rowid alias for INTEGER PRIMARY KEY tables, or via the index B-tree for WITHOUT ROWID tables.

The performance consequence is that range queries by timestamp — which are the queries you actually run — become sequential reads through contiguous pages rather than random-access B-tree traversals. A query for "the last hour of events" becomes one or two page reads regardless of table size. A query for "events between yesterday and the day before" becomes one seek and a sequential scan. SQLite's query planner picks this up automatically when the WHERE clause is on the leading column of the primary key.

The trap most teams fall into is using id INTEGER PRIMARY KEY with the timestamp as a secondary indexed column. This works, but every range query by timestamp pays the cost of looking up rowids in the index and then random-access fetching the rows from the heap. At a few million rows the difference is small. At a hundred million rows it is the difference between sub-millisecond and multi-second queries.

Pattern 2: WAL mode with synchronous=NORMAL

Default SQLite journaling mode is rollback journal with synchronous=FULL. This is the safest configuration and also the slowest under heavy write load because every write triggers an fsync. For time-series workloads where you can tolerate the loss of the last few seconds of writes if the OS crashes, switching to journal_mode=WAL and synchronous=NORMAL typically increases write throughput 10-100x.

WAL mode allows readers and writers to operate concurrently without blocking each other. This matters because time-series workloads usually involve continuous writes from ingestion plus periodic reads from dashboards or aggregation jobs. With rollback journal mode the readers block the writers and vice versa. With WAL mode they don't.

synchronous=NORMAL means SQLite syncs the WAL to disk at checkpoint boundaries (every few thousand pages by default) rather than after every transaction. The durability guarantee is that you will lose at most the last few seconds of committed transactions in a crash, which is acceptable for almost every time-series workload because you are not running a payment system on it.

Pattern 3: Aggregation tables, not aggregation queries

Time-series queries are dominated by aggregations: count of events per hour, sum of values per day, average response time per minute. Computing these on the fly from raw data is fine for a million rows and unsustainable beyond ten million. The pattern that scales is pre-aggregation into rollup tables.

The schema is one rollup table per granularity: events_hourly with one row per (hour, group-key) tuple containing aggregated counts and sums, events_daily rolling up events_hourly, and so on. The ingestion path writes to the raw events table. A periodic job (every 5 minutes is a reasonable starting cadence) reads new raw rows, aggregates them into rollup rows, and updates a watermark indicating how far the rollup is caught up.

Dashboard queries hit the rollup tables, not the raw events. The rollup tables are small (one row per hour per group instead of thousands of rows per hour), the queries are simple sums and averages, and the latency is sub-millisecond. The raw events table can grow to billions of rows because the dashboard never queries it directly — it is only consulted when a customer asks for a specific event by ID, in which case the timestamp-first primary key makes the lookup fast anyway.

This is essentially what TimescaleDB's continuous aggregates do under the hood. The TimescaleDB version is more elegant because the maintenance is automatic. The SQLite version is a periodic job you write. The job is a hundred lines of code and you can debug it.

Pattern 4: Monthly partitioning via ATTACH DATABASE

Beyond a billion rows in a single table, even the best-tuned schema starts to feel the cost of B-tree depth and database file size. The pattern that scales further is monthly partitioning: each month's data lives in a separate database file, and queries that span multiple months read from multiple databases via SQLite's ATTACH DATABASE mechanism.

The schema is identical across partitions. The ingestion path opens the current month's database and writes to it. A nightly job rotates partitions and detaches old ones. Dashboard queries that need a date range attach the relevant partitions, run a UNION ALL across them, and detach. Reads are still fast because each partition is small. Old partitions can be archived to cold storage by literally copying the file.

The cost is operational: you are managing a directory of database files instead of one. The benefit is that no individual file ever grows past the size where SQLite starts to feel slow, and old data archival is a file-copy operation rather than a database-export operation. We use this pattern across the four products for high-volume time-series data: webhook deliveries on WebhookVault, monitor pings on CronPing, flag-evaluation events on FlagBit, and PDF generation events on DocuMint.

When to actually graduate

The signs that SQLite is no longer the right choice for time-series data are specific. Sustained write throughput exceeding what one machine can durably commit, which is in the tens of thousands of rows per second on modern SSDs. Query patterns that fundamentally require time-series-specific features like gapfilling or time-bucketed window functions over very wide time ranges. Multi-region replication requirements that SQLite does not address. Operational requirements like point-in-time recovery that are easier in PostgreSQL with its WAL replication infrastructure.

None of these signs are common at the scale most products operate at. The signs that look like graduation triggers but usually are not: queries are slow (this is almost always a missing index or wrong primary key), the database file is large (this is almost always retention policy missing), the machine is at high CPU (this is almost always missing rollup tables). All three of these have SQLite-internal solutions before the database choice becomes the bottleneck.

The deeper point

Time-series data is a workload, not a database choice. The workload has specific characteristics — append-mostly writes, range-bounded queries, time-bucketed aggregations — and the patterns that work for the workload are similar across databases. SQLite happens to be very good at this workload when configured correctly, and the configuration is mostly four schema decisions you can make on day one.

The case for trying SQLite first is not that it is universally better than TimescaleDB. The case is that the operational cost of running SQLite is approximately zero, the operational cost of running TimescaleDB is non-zero and ongoing, and you can switch later. The case against trying SQLite first is the case that you already know TimescaleDB and the team is faster with it, which is sometimes true and worth respecting. The case that is wrong is the reflexive case that says time-series equals specialized database. Time-series equals specific access patterns, and specific access patterns can be served by general-purpose databases that you already know and run.