Backups vs Archives: The Distinction That Saves Money and Reputation

Most teams have a backup story and most teams do not have an archive story, and most teams cannot tell you which is which. The two are different products built for different audiences with different access patterns, and the conflation produces backup systems that are too expensive for what they do and archive systems that cannot actually retrieve what they store. The distinction is worth getting right early because the infrastructure choices compound and the migration cost grows linearly with the data volume.

This post covers the operational definition that distinguishes a backup from an archive, the access-pattern divergence that drives different storage choices, the retention discipline that keeps each cost-effective, the verification practice that distinguishes working from theatrical, and the legal-and-compliance overlay that complicates both.

The operational definition

A backup exists to restore the operational system from a recent point in time. The user is the operator. The access pattern is rare reads across the entire backup, fast when needed, with the time-to-restore metric being the load-bearing characteristic. The retention horizon is short — days to weeks — because the value of a backup decays rapidly with age. Yesterday's backup is critical; last quarter's backup is largely useless because the operational state has moved.

An archive exists to retrieve specific historical records on demand, possibly years later. The user is a regulator, an auditor, a legal team, a customer support agent, or a data scientist. The access pattern is rare reads of small subsets, with retrieval-time tolerance measured in hours or days rather than minutes. The retention horizon is long — years to decades — driven by legal requirements, contractual obligations, or analytical value.

The distinction matters because the operational requirements diverge: backups need to be fast and complete; archives need to be queryable and durable. A single artifact that tries to do both is too expensive as a backup and too inconvenient as an archive.

Access patterns and storage choices

Backups are read in rare bulk operations: a database restore touches every byte. The right storage tier is one that supports fast, undifferentiated bulk read with redundancy across at least two physical locations. Object storage with multi-region replication, or a dedicated backup service, is typically the right answer. The cost model is bytes-stored-per-month, with retrieval cost being negligible because retrievals are rare and complete.

Archives are read in selective patterns: a regulator asking for a specific transaction from three years ago needs that transaction, not the entire archive. The right storage tier supports indexed retrieval with potentially long retrieval latency in exchange for cheap storage. Cold-tier object storage, glacier-class archives, or specialized archive services are typical. The cost model is bytes-stored at very low rates, with significant retrieval cost when retrievals happen, and the application has to design around the higher latency.

The conflation pattern is using a single warm-tier object storage for both, which makes backups expensive and archives more expensive than they need to be. The split-tier pattern uses warm storage with short retention for backups and cold storage with long retention for archives, with each optimized for its actual access pattern.

Retention discipline

Backup retention should be aggressive. The 3-2-1 rule — three copies, two media, one offsite — applies to backup currency, not to permanent retention. The right backup-retention policy is daily backups for two weeks, weekly backups for two months, monthly backups for a year. Anything beyond a year is in archive territory, because the operational utility has expired. Deleting old backups is the discipline most teams skip and the one that bloats backup costs.

Archive retention is driven by external requirements: tax records seven years, healthcare records ten or twenty depending on jurisdiction, financial records seven, legal hold indefinite, GDPR right-to-erasure overriding all of the above for personal data. The retention policy is a legal artifact rather than an engineering one, and the engineering job is to implement what legal specifies rather than to invent a retention policy on the engineering team's intuition.

The interaction between archive retention and GDPR is the operational complication that surprises teams. The right-to-erasure obligation requires deletion of personal data on request, but the legal-records obligation may require retention of transaction history that includes personal data. The reconciliation requires either pseudonymization at archive time so the personal data can be deleted while preserving transaction integrity, or per-record cryptographic erasure so the personal data is destroyed while the record metadata persists. Both approaches require designing the archive schema with this requirement in mind from the start.

Verification

An untested backup is a story about a backup. The discipline that distinguishes working backups from backup theater is the regular restore drill, performed end-to-end against a production-shaped test environment, with success criteria that are pre-defined rather than retrospectively rationalized.

The drill cadence should match the recovery-time objective. A system with a 4-hour RTO needs monthly drills. A system with a 24-hour RTO might survive on quarterly drills. The drill should produce a measured restore time, a measured data-integrity check against a known reference, and a measured operational-validation that the restored system can serve traffic.

Archive verification is structurally different. The volume is too large for full restore, so the verification has to be sampling-based: pull a random sample of records, verify integrity, verify queryability through the actual access path the application exposes. The frequency can be lower because the access pattern is naturally sampled — the regulator's request is itself a verification event — but the discipline is to do explicit verification on a schedule rather than relying on production access to surface problems.

The classic failure modes for both: backups that succeed at the storage layer but fail at the recovery layer because of schema drift; backups that include the database but not the configuration needed to run it; archives that are present but unindexed and effectively unrecoverable; archives whose retrieval mechanism has degraded over years without being exercised. The drill is what catches each of these before the production event.

Legal and compliance overlay

The legal overlay turns both backups and archives into compliance artifacts. The discoverable-data status of backups during litigation is a serious operational consideration: backups are typically discoverable, which means that data the application has deleted may still exist in backups and may have to be produced. The right policy is to align backup retention with operational data retention so that deleted data ages out of backups within a defined window.

The encryption-at-rest requirement applies to both, with key management as the operational risk. Encrypted backups whose keys are also lost are equivalent to no backups, and the key-storage policy must be separate from the backup-storage policy. Hardware security modules, separate cloud accounts, or escrowed keys with multiple holders are the typical answers; the wrong answer is keys stored alongside the data they encrypt.

The chain-of-custody requirement applies more strongly to archives than to backups, because archives are more likely to be evidence in legal proceedings. The discipline is logging every access to archive records with actor and timestamp, ideally in tamper-evident form, so that the integrity of the record can be defended in court.

The Anethoth implementation

The pattern across the four Anethoth products is uniform: SQLite snapshot-encrypt-upload to multi-region object storage on a daily schedule, with two-week retention for daily backups and one-year retention for monthly backups. The archive layer for transaction records, audit logs, and webhook captures uses cold-tier object storage with seven-year retention. The drill cadence is monthly for the backup side and quarterly for the archive side, with the drill scripts checked into the same repository as the backup scripts so they cannot drift apart.

The split between DocuMint, CronPing, FlagBit, and WebhookVault is product-by-product, with each product running its own backup pipeline and its own archive pipeline, but the storage backend is shared so the operational cost stays manageable. The key-management is centralized in a separate cloud account that nothing else has access to, with two human escrow holders for the master keys.

The discipline of treating backups and archives as different products with different requirements is the unglamorous engineering that makes the difference between data that is recoverable and data that is theoretically recoverable. The cost of getting it right is low, the cost of getting it wrong is occasionally civilization-ending for the company that gets it wrong, and the discipline compounds because the storage choices made early determine what is operationally feasible later.