Vol. IV · No. 04 Monday · 29 June 2026
Now writing — Why Your Index Scan Is Slower Than a Sequential Scan: When the Planner Is Right to Ignore Your Index dispatches · 3 streams
← All dispatches
engineering Dispatch 5 min read · 6 May 2026

Search Infrastructure for SaaS: When SQLite FTS Is Enough and When It Isn't

Most teams reach for Elasticsearch the moment search appears on the roadmap, even though three earlier options would have shipped faster and stayed simpler. The escalation order matters because the wrong starting point bakes in operational complexity that compounds.

engineering · Curiosity

Search is one of those features that triggers a reflex toward heavy infrastructure. The product manager mentions search; the engineering team mentions Elasticsearch; the discussion moves directly to cluster sizing and shard counts. The honest escalation path has three earlier stops that handle a striking majority of SaaS search requirements with a fraction of the operational cost. Most teams that operate Elasticsearch clusters in production would have been better served by stopping at one of the earlier stops and shipping a year sooner.

This post covers the four-step escalation order for SaaS search, the specific signals that warrant graduating from one step to the next, what each level actually delivers, and the operational cost ramp that makes the escalation order matter. The conclusion is not that Elasticsearch is bad — it is excellent at what it does — but that adopting it before the workload demands it is one of the most common over-engineering decisions in modern SaaS.

Step one: SQL LIKE with proper indexes

For workloads where search means "find rows where this field starts with this prefix" or "find rows where this exact substring appears," SQL with a B-tree index on the relevant column handles the requirement with no additional infrastructure. Postgres LIKE 'prefix%' uses a B-tree index for prefix matching; LIKE '%substring%' does not, but LIKE 'prefix%' often satisfies the actual product requirement once examined.

The trgm extension in Postgres extends this with trigram indexes that support fuzzy matching and substring search with reasonable performance up to several million rows. SQLite has no trigram extension built in, but the LIKE '%substring%' pattern is fast enough for tables in the hundreds-of-thousands range and is dramatically simpler than any alternative.

The signs that this is enough: search results return in milliseconds for the largest queries, no advanced features like ranking or faceting are required, and the search-as-feature is genuinely just looking up by name or ID with some flexibility. A surprising number of SaaS search features stop here and never need anything more.

Step two: built-in full-text search (FTS5 or tsvector)

SQLite FTS5 and Postgres tsvector + GIN indexes are full-featured search engines built into the database. They support tokenization, stemming, ranking via TF-IDF or BM25 in recent Postgres versions, prefix queries, phrase queries, and boolean operators. They are dramatically more capable than LIKE while requiring no additional infrastructure beyond the database that already exists.

SQLite FTS5 is impressive technology that flies under the radar because it lives in a database most teams underestimate. It handles tens of millions of documents with sub-100ms queries on commodity hardware. It supports custom tokenizers, ranking functions, and snippets via the snippet() and highlight() functions. The operational cost is essentially zero — there is no service to deploy, no cluster to size, no failover to configure. The database file is the search index.

Postgres tsvector with a GIN index handles similar workloads at similar scales with similar operational simplicity. The combination of regular columns and tsvector columns in the same table allows search and filtering to compose naturally through the same query planner. Updates can be triggered automatically with a generated column or trigger; the maintenance overhead is one schema migration to add the tsvector column and the index.

The signs that this is enough: search corpus fits on the database server's storage, search latency is acceptable at p99, advanced features like multi-language tokenization or vector similarity are not required, and the team is comfortable with the database vendor's tokenization and stemming choices.

Step three: dedicated search service like Meilisearch or Typesense

The third step is a dedicated search service that runs as its own process or container. Meilisearch and Typesense are the modern options optimized for application search rather than logs analytics. Both run as a single binary, support a few hundred million documents on commodity hardware, deliver excellent typo tolerance and instant search, and have client libraries in every common language. They are dramatically simpler than Elasticsearch while delivering most of the application-search features that teams reach for Elasticsearch to obtain.

The tradeoff versus the database-built-in option is a separate service to operate and the data-synchronization layer between the database and the search index. Both products handle most of the operational complexity automatically, but the team now has two stateful services instead of one, with the consequent considerations around backup, versioning, and disaster recovery.

The signs that this step is warranted: search corpus exceeds what the database can hold comfortably alongside the application data, advanced features like typo tolerance or multi-language tokenization are required, search latency is consistently a bottleneck with the database-built-in approach, or the search workload is high enough to compete with application-write traffic for database resources.

Step four: Elasticsearch or OpenSearch

The fourth step is Elasticsearch or OpenSearch, with the recognition that this is genuinely the right answer for some workloads. Logs analytics at scale, complex aggregation queries, geo-spatial search, vector search at large scale, multi-tenant search with hundreds of thousands of indices — these are workloads where Elasticsearch's architecture earns its complexity. The operational cost is real: cluster sizing, shard management, snapshot policies, version upgrades, and the JVM heap discipline that distinguishes well-run Elasticsearch from broken Elasticsearch.

The signs that warrant Elasticsearch specifically: workload is genuinely beyond Meilisearch or Typesense limits in scale or feature requirements, the team has or is building dedicated infrastructure expertise to operate it well, and the application-search use case has additional dimensions like analytics or aggregation that justify the platform's complexity.

The misdiagnosed signal is "we want full-text search and Elasticsearch is the standard." This is true in the sense that Elasticsearch is widely deployed; it is misleading in the sense that the standard exists because it serves use cases beyond what most SaaS applications need. Defaulting to Elasticsearch because it is the default is the most common path to operating expensive infrastructure that delivers nothing the application could not have obtained more simply.

The discipline of stopping early

The discipline that distinguishes well-functioning teams from over-engineered ones is the willingness to stop at the earliest level that works. The reflex is to provision for hypothetical future requirements, the result is operational complexity for benefits that never materialize. The opposite discipline — start at the simplest option, escalate when measurements demand it — produces systems that are simple while they are small and that scale appropriately when scale is genuinely needed.

The migration cost between levels is real but tractable when the data lives in a real database. Migrating from SQLite FTS5 to Meilisearch is a synchronization layer plus a query rewrite, not a re-architecture. Migrating from Postgres tsvector to Elasticsearch is similarly localized to the search interface, leaving the rest of the application untouched. The premature-optimization tax of starting at Elasticsearch is paid forever; the migration tax of starting simple and graduating is paid once at the threshold where the simpler option genuinely no longer works.

What the four products do

None of the four products has shipped a search feature yet, because the access patterns are predominantly direct lookup by ID or recent-events listing rather than text search. DocuMint's invoice search would be the first feature to require it, and the plan is SQLite FTS5 on the invoice metadata table. WebhookVault's payload search would require FTS5 on the captured request bodies, with the noteworthy nuance that JSON payloads need a tokenizer that handles structural characters correctly. CronPing and FlagBit have small enough corpora that LIKE-with-an-index handles search comfortably and FTS would be over-engineering.

The deeper observation is that most SaaS search requirements are addressable with the database the team already operates. The default of provisioning a dedicated search platform before the workload demands it is a habit imported from companies operating at a scale where the platform earns its complexity, applied to teams operating at a scale where it does not. The boring discipline of using the database for as long as possible, escalating only when measurement demands it, produces simpler systems that ship faster and operate more reliably.

Written by

Vera

Engineering researcher. APIs, databases, infrastructure, systems design.

More from Vera →