Field-Level Encryption: Patterns for Encrypting Specific Database Columns

Disk-level encryption protects data at rest from physical theft. TLS protects it in transit. Neither protects sensitive fields from the queries you accidentally write or from compromised application code. Field-level encryption fills the gap, with patterns and costs that matter.

Disk-level encryption protects against an attacker who steals a hard drive. TLS protects against an attacker on the network path. Neither protects against the threats most application teams actually face: a SQL injection that returns customer Social Security numbers, an over-privileged internal tool that exposes API keys in audit logs, a backup file dumped to S3 without encryption, or a developer running an analytics query that joins customer emails into a Slack notification. Field-level encryption — encrypting specific columns within an otherwise-readable database — closes the gap. It comes with real costs that determine where it is the right choice and where it is the wrong one.

We use field-level encryption selectively across DocuMint, CronPing, FlagBit, and WebhookVault — for items where the encryption cost is justified by the threat model. Most fields do not need it. The patterns for the ones that do are worth understanding in detail.

What field-level encryption buys

The fundamental property is that even if an attacker reads the database directly — via a backup, a snapshot, a SQL injection, or a misconfigured replica — the encrypted columns are unreadable without access to the encryption key, which lives somewhere outside the database.

The threat model is specifically: attacker has database read access, attacker does not have application-server read access, attacker does not have access to the key management service. The first condition is realistic (backups, replicas, SQL injection, insider threats). The second condition is the boundary: if the attacker has both database access and application access, field-level encryption helps less, because the application has the key.

The right candidate fields are those that: are highly sensitive, are rarely queried by content (you almost always look them up by ID), do not need to participate in indexes or joins, and do not need to be aggregated. Examples: API key plaintext (in the brief window before discarding), customer Social Security numbers, payment account numbers, encrypted message bodies, OAuth refresh tokens stored on behalf of customers.

What it does not buy

It does not protect against an attacker who controls the application server. It does not enable searching by encrypted field content — at least not without significantly more complex schemes like searchable encryption, which have their own trade-offs. It does not eliminate the need for backups, access controls, network restrictions, and audit logging. And it does not solve the key management problem; it moves it to a place where you can hopefully solve it well.

The basic pattern

A minimum-viable implementation uses authenticated encryption (AES-256-GCM or similar), with the key fetched from a key management service at application startup. The encrypted value, the IV, and a key identifier are stored together in a single column:

from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os, base64, json

class FieldEncryptor:
    def __init__(self, key_bytes, key_id):
        self.cipher = AESGCM(key_bytes)
        self.key_id = key_id

    def encrypt(self, plaintext: str) -> str:
        iv = os.urandom(12)
        ct = self.cipher.encrypt(iv, plaintext.encode(), None)
        envelope = {
            "v": 1,
            "kid": self.key_id,
            "iv": base64.b64encode(iv).decode(),
            "ct": base64.b64encode(ct).decode(),
        }
        return json.dumps(envelope, separators=(',', ':'))

    def decrypt(self, ciphertext: str) -> str:
        envelope = json.loads(ciphertext)
        iv = base64.b64decode(envelope["iv"])
        ct = base64.b64decode(envelope["ct"])
        return self.cipher.decrypt(iv, ct, None).decode()

The envelope format — version, key ID, IV, ciphertext — is the load-bearing part. It enables key rotation (you can decrypt with old keys while encrypting with new ones), algorithm migration (a v2 envelope can use a different cipher), and forward compatibility. Storing just the raw ciphertext is the rookie mistake that forces a migration when any of these change.

Key management is the real problem

The encryption itself is the easy part. Key management is where field-level encryption succeeds or fails. The key cannot live in source code, cannot live in the database next to the encrypted data, and ideally cannot live in unencrypted environment variables on disk.

The canonical pattern is a cloud KMS — AWS KMS, Google Cloud KMS, Azure Key Vault. The application uses IAM credentials to fetch the key on startup. The KMS provides audit logs, automated rotation, and access controls that the database does not.

The lighter-weight pattern, which we use at our scale, is envelope encryption with a master key in a chmod-600 environment file outside the application directory. The master key encrypts per-row data keys, which encrypt the actual fields. The master key never appears in logs, error messages, or backups. The trade-off is that master key compromise compromises all encrypted data, while a true KMS can provide additional isolation.

The wrong pattern is storing the key in the same database as the encrypted data, in source control, or in an unencrypted backup. The threat model collapses if the attacker can grab both the data and the key in a single breach.

Key rotation

Rotation is the part most field-level encryption schemes get wrong by omission. The key must eventually be rotated — because of a suspected compromise, a regulatory requirement, or simple hygiene. Rotation requires: (1) being able to decrypt with the old key, (2) re-encrypting with the new key, and (3) eventually removing the old key once nothing uses it.

The envelope format with a kid field makes step (1) trivial: the decryptor looks up the key by ID. Step (2) is a background job that scans encrypted rows, decrypts with the old key, and re-encrypts with the new key, in chunks, with progress tracking. Step (3) requires confidence that no envelope with the old kid remains, which requires either complete migration or per-row tracking.

For our internal API key storage in WebhookVault and Copilot, we use a rolling-window rotation: new keys encrypted with the current master, old keys gradually re-encrypted via background scanner. A failure during rotation does not lose data because the envelope format is self-describing.

Performance cost

AES-256-GCM on modern CPUs with AES-NI instructions is fast — well under a microsecond per kB. The cost that actually matters is the round-trip to a cloud KMS for the decryption operation, which is typically 5-50 milliseconds. The mitigation is local caching of decrypted data keys with a short TTL, which amortizes the KMS round-trip across many decryptions.

For our pattern with a local master key, the cost is essentially the AES cost, which disappears in noise compared to database round-trips. For a cloud KMS pattern, the cost matters and requires caching strategy.

What we do not encrypt

The list of fields we do not encrypt is longer than the list we do, and the reasoning is worth being explicit about. Email addresses are not encrypted: they need to be searchable and they appear in dozens of operational queries. Customer names are not encrypted for the same reason. Created-at timestamps are not encrypted: they need to be indexable and queryable. Plan tier, monthly usage counters, and product configuration are not encrypted: they are not particularly sensitive and they participate in every query.

The encrypted fields are: API key plaintexts (we hash; this is in the rare case where we need to display once), credential storage in Copilot's vault, request bodies in WebhookVault that customers have opted to retain, and similar narrow categories. The discipline is that every encrypted field has an explicit threat-model justification, not a default-on policy.

The deeper observation

Field-level encryption is one of those security mechanisms whose value depends entirely on the rest of the system being correctly designed. If access control is sloppy, encryption only delays the breach. If key management is sloppy, encryption is theater. If the threat model is unclear, encryption is encryption-of-nothing. The decision to add it should be specific to fields where the threat model is articulated and the cost of the operational complexity is paid for by a clear reduction in real risk. Default-on encryption for every column is the same anti-pattern as default-off logging or default-anything across the board — it indicates a team that has not thought about the actual tradeoffs and is hoping that doing more will compensate for not knowing which more matters.

Read more