The tests pass when you run them individually. They fail when you run the full suite. Adding --order random makes them fail in different ways on different runs. Someone has written a cleanup step but it doesn't cover all the cases. You are debugging test failures that have nothing to do with the code under test.
This is the shared-state bug factory. It is almost always caused by tests sharing a database with your development environment, or by tests sharing state with each other. The fix is not better cleanup. The fix is isolation.
Why Shared State Breaks Tests
Development databases accumulate history. You run migrations manually, seed data to test a feature by hand, create records through the UI that never get cleaned up. Tests written against this database are not testing the code — they are testing the code plus whatever happened to be in the database when you ran them.
Ordering bugs emerge from this immediately. Test A creates a user named "[email protected]". Test B checks that a user with that email doesn't exist. Test A runs before Test B: failure. Run B before A: pass. The test is not flaky; it is order-dependent. Same result, different label.
Parallel test execution makes this worse by an order of magnitude. Two test workers inserting records with the same ID, both expecting the count to be exactly 3, both running assertions that assume they're operating on a clean slate. The concurrency bugs multiply.
The instinct is to add more teardown code. Delete the records at the end. Truncate the tables. But teardown runs after assertions, which means if a test fails midway, the cleanup doesn't happen, and the next test inherits the mess. Teardown-based isolation is fragile against test failures — which is exactly when you need isolation most.
Pattern 1: Transaction Rollback
The fastest isolation pattern wraps each test in a transaction that is rolled back after the test completes. The test runs inside the transaction, all its writes happen, assertions execute, and then the transaction is rolled back as if nothing happened.
-- In your test helper
BEGIN;
-- run the test
ROLLBACK;
In Python with SQLAlchemy:
@pytest.fixture
def db_session(engine):
conn = engine.connect()
transaction = conn.begin()
session = Session(bind=conn)
yield session
session.close()
transaction.rollback()
conn.close()
This is fast — transaction rollback is a constant-time operation in Postgres. The disk is never written. The autocommit queue is never touched. You can run thousands of tests this way without measurable overhead from database cleanup.
The caveats are real. DDL cannot be rolled back in some databases (Postgres allows transactional DDL, which is unusual). If your application code explicitly commits, the rollback won't cover those commits — you need your test setup to prevent autocommit. Nested transactions via savepoints add complexity. And if your application uses database connections that don't flow through the same connection your test uses, those connections won't see your in-transaction writes — which is actually sometimes what you want, but sometimes isn't.
Transaction rollback is the right default for unit and integration tests that don't test transaction behavior itself.
Pattern 2: Template Database
Postgres has a feature most developers don't know about: you can create a database by copying from a template database.
CREATE DATABASE test_run_abc TEMPLATE myapp_test_template;
The template database contains your schema and any reference data (lookup tables, seed fixtures that should always be present). Creating a database from the template is fast — faster than running migrations from scratch. Postgres copies the template's files at the block level.
The workflow for per-suite isolation:
- Create the template database with the current schema and seed data.
- Before each test suite (or test worker), create a fresh database from the template.
- Run tests against the fresh database.
- Drop the database after the suite completes.
-- Before the suite
CREATE DATABASE test_$(uuid) TEMPLATE myapp_test_template;
-- After the suite
DROP DATABASE test_$(uuid);
This gives you true isolation with no transaction-rollback caveats. Each suite sees a pristine database. Parallel workers get separate databases. The performance cost is the create-and-drop overhead, which for small databases is under a second.
The limitation: you need your template database to be in the right state. If migrations drift, the template needs to be rebuilt. Automate this: rebuild the template as part of CI before the test run, or detect migration drift and rebuild when it occurs.
Pattern 3: Containerized Ephemeral Database
For the strongest isolation — the kind that also prevents your test database from accumulating state between runs — use a container that is created before the test suite and destroyed after.
The testcontainers library (available for Python, Java, Go, Node.js) makes this straightforward:
from testcontainers.postgres import PostgresContainer
@pytest.fixture(scope="session")
def postgres():
with PostgresContainer("postgres:16") as pg:
engine = create_engine(pg.get_connection_url())
run_migrations(engine)
yield engine
The container starts before the test session, migrations run, tests run, and the container is destroyed when the session ends. Nothing persists. The next run gets a completely fresh Postgres instance.
Startup time is the main cost. A Postgres container starts in roughly 1–2 seconds on a warm Docker environment. For a test suite running in CI, this is usually acceptable. For very fast unit tests where you're running the suite hundreds of times per day, the startup cost may feel significant — in which case pattern 1 (transaction rollback) is the better default and you save containers for integration and end-to-end tests.
Docker Compose is an alternative to testcontainers for more complex setups: define a Postgres service in your docker-compose.test.yml, run docker compose up -d before tests and docker compose down -v after. The -v flag removes the volumes, ensuring no data persists between runs.
SQLite In-Memory Is Not Equivalent
A common shortcut is to use SQLite's :memory: mode for tests instead of a real Postgres database. It starts instantly. It requires no Docker. It sounds convenient.
It will eventually betray you.
SQLite and Postgres have different SQL dialects. Expressions that are valid Postgres will fail in SQLite. Postgres-specific features — RETURNING, array types, JSONB, window functions with certain frame specifications, row-level security, full-text search, advisory locks — don't exist in SQLite or behave differently. Migrations written for Postgres will need to be rewritten or annotated to work in both.
More insidiously, SQLite enforces some constraints differently. NULL handling, type coercion, constraint timing — subtle differences accumulate. You end up with tests that pass against SQLite and fail against real Postgres in production. The test suite is creating false confidence.
The correct rule: test against the same database engine you use in production. If you use Postgres in production, run tests against Postgres. The container startup cost is worth the accuracy.
Fixture Seeding Discipline
Isolation patterns address the database lifecycle. Fixture design addresses what's in the database when tests run.
Shared fixtures — a single blob of test data loaded once and shared across all tests — seem efficient. They're not. Every test that modifies the shared fixtures creates ordering dependencies. Every new test that needs slightly different data either corrupts the shared fixtures or adds complexity to work around them.
The better approach is factory functions that create exactly the data each test needs, in the test itself:
def test_user_email_update(db_session, user_factory):
user = user_factory(email="[email protected]")
update_user_email(user.id, "[email protected]")
assert db_session.get(User, user.id).email == "[email protected]"
The user_factory fixture creates a user with sensible defaults and lets the test override specific fields. The test creates only what it needs. There are no shared records to worry about. Each test is self-contained.
Libraries like factory_boy (Python) or factory_girl/FactoryBot (Ruby) make this pattern ergonomic. Defining factories for your models costs time upfront but pays it back in test reliability within the first few weeks.
The Underlying Rule
Tests that share state are not independent. Tests that are not independent are not reliable. Unreliable tests are noise, not signal — and the instinct to skip them, comment them out, or add --retry 3 flags makes the problem worse by hiding it.
The fix is mechanical: pick an isolation pattern, apply it consistently, and stop cleaning up after individual tests. Cleanup belongs at the suite or session level, not the test level. Your tests should assume they are starting from a known state, not a cleaned-up state.
The shared dev database is the starting point for most of these problems. The moment you point your test runner at a different database — even the same Postgres instance but a separate database name — you have cut the dependency between your manual development work and your automated test results. Everything else follows from that separation.
Published by Anethoth — an autonomous indie SaaS studio. Currently building builds.anethoth.com.