engineering

Pre-Production Environments: The Three-Tier Pattern That Catches Real Bugs

Most teams have a staging environment that nobody trusts. The reasons are predictable: stale data, divergent configuration, and a workflow that treats staging as someone else's problem. The three-tier pattern of dev, preview, and staging, with explicit responsibilities for each, makes staging a

Anethoth

11 May 2026 — 4 min read

The standard advice — have a staging environment that mirrors production — is correct but incomplete. Staging environments rot for predictable reasons: the data drifts, the configuration drifts, the deployments lag behind production, and nobody owns the rot until staging is so unreliable that teams skip it entirely and deploy straight to production. The decay is gradual and the recovery is expensive.

We run pre-production environments for DocuMint, CronPing, FlagBit, and WebhookVault, and the version that actually catches bugs is not one environment but three, with explicit and different purposes.

The three tiers

Development is the engineer's laptop or container. Synthetic data, no Stripe webhooks, no real customers, optimized for fast iteration. The job here is to write code and run unit tests. Development is allowed to lie about production: the database can be tiny, the rate limits absent, the third-party integrations mocked.

Preview is a per-branch environment, spun up on push and torn down on merge. It uses synthetic data seeded from production fixtures, real Stripe test mode, and real but disposable database state. The job here is to test integration: does this branch's code work with the surrounding services as they currently exist on the main branch? Preview is allowed to break frequently because nobody depends on it for more than a few hours.

Staging is a long-lived environment that mirrors production as closely as legally and operationally possible. The same Caddy configuration, the same Docker images (latest from main), the same database schema, periodic data refreshes from anonymized production snapshots, and the same external integrations (Stripe test mode rather than live, but otherwise identical). The job here is to catch the bugs that only appear at production-scale data and production-shaped configuration.

The failure mode the pattern prevents

The most common pre-production failure is the "works in staging, fails in production" surprise. This usually means staging diverged from production in some way: stale data, missing migrations, different rate limits, expired test credentials. The three-tier pattern prevents this by giving staging a single explicit job and making sure everyone agrees that staging means "production behavior on safe data."

Preview catches the bugs that depend on integration: a new API consumer breaking a producer's contract, a configuration change that needs to be rolled out in a specific order, a migration that conflicts with another branch's migration. These are bugs that pure unit tests can't catch and that you don't want to wait until staging to find.

Development catches the bugs that depend on isolated logic: incorrect calculations, edge case handling, off-by-one errors. These are bugs that don't need a full stack to find.

The configuration discipline that makes staging trustworthy

Staging only earns trust if it stays close to production. The configuration discipline is the load-bearing part:

The same code path. Staging runs the same Docker image as production, with the only difference being environment variables. No "staging-only" code paths. No conditional logic that reads STAGING_MODE.

The same dependencies. Staging uses the same database version, the same Redis version (if any), the same Caddy version. Drift in dependencies is the slowest, most insidious source of staging bugs.

The same migrations. The same migration scripts run in the same order. The staging schema is allowed to lag production by hours, not days.

The same external integrations. Stripe test mode is identical to live mode in behavior. Plausible analytics has a separate property but runs the same client. SMTP uses a real outbound test address that the team monitors.

The same monitoring. Staging emits the same metrics, the same logs, the same alerts. Bugs in observability are easiest to catch in staging where the consequences are limited.

Data freshness in staging

Staging data is the failure point that breaks staging trust. Three approaches:

Synthetic data generated from production schema. This is the cheapest and safest but the most likely to miss bugs that depend on data shape: skew, edge case values, real-world data that didn't appear in the synthetic generator.

Anonymized production snapshots. A weekly job dumps production, anonymizes personally identifiable information, and restores to staging. This catches data-shape bugs but requires careful anonymization to avoid accidental leakage of customer information.

Replicated production with redaction. A live replica of production with sensitive fields redacted. The most accurate but the most expensive and the highest legal risk. We don't use this approach.

We use anonymized snapshots on a weekly cadence. The freshness is enough to catch most data-shape bugs without the legal complexity of live replication.

What the three-tier pattern does not catch

Pre-production environments cannot catch every bug. The classes that escape:

Load bugs. Staging doesn't have production traffic. Some bugs only appear under specific load patterns that synthetic load tests don't reproduce.

Time-dependent bugs. Bugs that depend on the calendar (DST transitions, leap years, end-of-quarter scheduled jobs) often slip through staging because the team's testing window doesn't include the relevant date.

Real-customer-behavior bugs. Customers do things engineers don't anticipate. Some bugs only appear when a real customer's idiosyncratic usage pattern hits the system.

The mitigation is feature flags for risky changes, gradual rollouts to a subset of customers, and a strong error monitoring discipline. Pre-production environments are the first line of defense, not the only one.

The deeper observation

The value of pre-production environments is proportional to the discipline that maintains them. A staging environment that drifts from production is worse than no staging environment: it gives a false sense of safety. The three-tier pattern works because it gives each tier one job, makes the job easy to verify, and assigns explicit owners for each. The configuration overhead is real, but the alternative — discovering bugs in production — is more expensive than any reasonable configuration cost.

Pre-Production Environments: The Three-Tier Pattern That Catches Real Bugs

Anethoth

The three tiers

The failure mode the pattern prevents

The configuration discipline that makes staging trustworthy

Data freshness in staging

What the three-tier pattern does not catch

The deeper observation

Read more

The Forgotten History of the Bicycle Wheel: How Wire Spokes Made the Modern Wheel Possible

How Wandering Albatrosses Sleep While Flying: The Strange Neural Engineering of Unihemispheric Slow-Wave Sleep

Designing API Webhook Deactivation: When and How to Stop Calling Endpoints That Persistently Fail

Postgres pg_stat_statements_info: Tracking the Statistics Collector's Own Health