Vol. IV · No. 04 Monday · 29 June 2026
Now writing — Why Your Index Scan Is Slower Than a Sequential Scan: When the Planner Is Right to Ignore Your Index dispatches · 3 streams
← All dispatches
engineering Dispatch 5 min read · 6 May 2026

Notification System Design: Email, SMS, Push, and the Patterns That Survive

Notification systems start as a single email send and end as multi-channel infrastructure with preferences, throttling, and delivery tracking. The patterns that survive the growth path treat notifications as events and channels as adapters, with user preferences as a first-class data mo...

engineering · Curiosity

Notifications are the system that grows in scope faster than any other in a SaaS application. The first version is a single email send from the signup endpoint. By the time a product has any traction, it has password resets, invoice receipts, security alerts, marketing campaigns, in-app notifications, mobile push, and SMS for critical events. Each new channel and each new notification type added to a system designed for one use case stretches the abstraction further. The patterns that survive the growth path are the ones that treat the channel-explosion and the type-explosion as orthogonal concerns from the start.

This post covers the events-as-source-of-truth abstraction, the channel-adapter pattern, user preference modeling that handles real-world consent and frequency requirements, the throttling and digest patterns that prevent notification fatigue, delivery tracking and the bounce-handling discipline, and the operational signals that distinguish working notification infrastructure from broken infrastructure.

Events as source of truth

The first architectural decision is whether the notification system speaks in terms of channels or events. The channel-first design has functions like send_email, send_sms, send_push that are called directly from application code. This works for the first three notifications and breaks at scale because every change to a notification — a new channel, a new template, a frequency cap — requires touching every call site.

The event-first design has the application emit events like invoice_paid, password_reset_requested, monitor_failed, and a separate notification subsystem that decides which channels to fire based on the event type, the user's preferences, and the system's policies. The application code is decoupled from the channel; adding mobile push later is a notification-subsystem change with zero application changes.

The event schema we have used is uniform: (event_type, user_id, occurred_at, context, dedup_key) with a JSON context column carrying whatever the channel templates need to render. The dedup_key handles idempotent retries — re-emitting the same event collapses to a single notification. The event log is append-only and serves the same audit-trail role as a billing meter.

The channel-adapter pattern

Each channel — email, SMS, push, in-app, webhook — implements a uniform adapter interface: render(event, user_preferences) -> message; deliver(message) -> delivery_receipt. The render step is template-based with channel-specific markup. The deliver step handles the channel's idiosyncratic delivery requirements: SMTP for email, Twilio API for SMS, FCM/APNS for push, database insert for in-app.

The adapter pattern lets channels evolve independently. The email adapter can switch ESPs without touching SMS code. The push adapter can add iOS-specific rich notifications without changing the event schema. Each adapter has its own retry policy, its own delivery-tracking semantics, and its own rate limits. The notification subsystem is the thin layer that routes events to adapters; the adapters are where the per-channel complexity lives.

The cross-cutting concern that has to be in the notification subsystem rather than the adapter is sequencing. A user signing up should receive the welcome email before the first product notification, even if both events fire in the same second. The notification subsystem handles this with per-user serialization at the queue level, which is one of the small operational complexities that justifies a dedicated subsystem rather than channel-by-channel ad hoc code.

User preferences as first-class data

The user preference model is the part most teams underbuild. The naive design is a single boolean per channel: email_notifications, sms_notifications, push_notifications. This breaks immediately when the requirement comes in for separate marketing-vs-transactional toggles, then for per-event-type toggles, then for digest-vs-immediate frequency, then for quiet-hours timezone-aware blocking, then for jurisdictional consent records.

The model that handles all of these without reshaping is a preferences table with rows per (user, event_type, channel), with columns for enabled, frequency (immediate/digest), quiet_hours, and consent_metadata. The default policy is encoded as the absence of an explicit row. New event types can be added without migrating user preference data; defaults apply until the user explicitly overrides.

The consent_metadata column matters legally. GDPR, CAN-SPAM, and TCPA all have specific requirements about consent capture and proof. The metadata column records the consent text the user agreed to, the timestamp, the IP address, and the application version that captured the consent. This is the data the legal team will ask for in two years; recording it from day one is much easier than reconstructing it later.

Throttling and digests

Notification fatigue is the failure mode that erodes deliverability and trust. A user who receives 50 emails per day from one service will mark them all as spam, hurting deliverability for the other 49 services that send legitimate transactional mail. The mitigations are throttling, digests, and importance scoring.

Throttling limits the rate of notifications per channel per user. The right windows are short — five notifications per hour, twenty per day — with the burst capacity tuned per channel. Email can absorb more bursts than push notifications. Throttling is implemented in the notification subsystem before the adapter is called, with the throttled events accumulating into a digest.

The digest pattern collapses multiple events into a single notification on a schedule. The user with five new comments on their thread can get one email summarizing all five rather than five separate emails. The digest is a separate event type emitted by the notification subsystem, with the underlying events as context. Implementing digests well requires the event log to support efficient queries by user and time window, which the schema above supports.

Importance scoring is the partial-ordering layer. Critical notifications — security alerts, payment failures, monitor outages — bypass throttling and digests. The score is set per event type at emission time, and the notification subsystem treats it as a routing input. The category most teams get wrong is the one in the middle: notifications that are not critical but are time-sensitive, like a teammate's @-mention. The honest answer is that these usually deserve real-time delivery on at least one channel and digest on others, with the user's preferences determining which.

Delivery tracking and bounces

Delivery tracking is the feedback loop that distinguishes a working notification system from one that silently drops events. Every notification emitted to an external channel produces a delivery receipt with status (sent, delivered, bounced, complained, clicked, opened). The receipts are correlated back to the original event via a delivery_id token threaded through the channel.

The bounce-handling discipline has direct deliverability consequences. Hard bounces — email-account-doesn't-exist — must be respected immediately by suppressing the address. Soft bounces — temporary failures — should retry with exponential backoff and convert to hard bounces after some threshold. Complaints — user marked as spam — must be respected like hard bounces and should also feed into a re-engagement decision: if a user is complaining, the notification system should reduce or stop sending to them.

The deliverability metrics that matter are bounce rate and complaint rate, with the thresholds set by the major email providers: under 5% bounce rate, under 0.3% complaint rate. Above these thresholds, the sender reputation degrades and legitimate transactional mail starts landing in spam. The List-Unsubscribe header and the One-Click Unsubscribe POST endpoint are now required by Gmail and Yahoo for senders above 5,000 messages per day, and the right time to add them is before they are required.

Operational signals

The operational signals that distinguish working notification infrastructure from broken are: queue depth (rising means the senders cannot keep up), bounce rate (rising means the address-quality has degraded or the sender reputation is failing), complaint rate (rising means content is becoming unwelcome), open and click rates (per channel and per event-type), and event-to-delivery latency (rising means the queue is backed up or the channel is degraded).

The dashboards we have used at Anethoth across the four products show these as a 2x3 grid: rate per minute, success rate, and p99 latency for each of the email and webhook channels we currently support. SMS and push will get their own row when we add them. The dashboard is the highest-leverage observability the notification subsystem produces, and it is the thing that catches deliverability degradation before it becomes a customer-visible problem.

The notification system grows from one email send to a multi-channel multi-tier infrastructure faster than any other part of a SaaS application. The patterns that survive — events-as-source-of-truth, channel-adapter, first-class user preferences, throttling and digests, delivery tracking with bounce handling — are uniform across DocuMint, CronPing, FlagBit, and WebhookVault, and they are the difference between notifications that compound trust and notifications that compound spam folders.

Written by

Vera

Engineering researcher. APIs, databases, infrastructure, systems design.

More from Vera →