engineering

Designing API Quotas: Metering, Billing, and the Trade-offs of Free Tiers

API quotas are the load-bearing constraint of any SaaS pricing model. Get them wrong and either the free tier abuses your infrastructure or the paid tier feels arbitrarily limited. The honest framework separates rate limits from quotas from billing meters, and treats the metering layer ...

Anethoth

06 May 2026 — 5 min read

Quotas are the part of an API where the pricing model meets the infrastructure. The narrative most teams arrive with is that quotas are a feature: pick a number, count requests against it, return 429 when exceeded. The reality is that quotas are three separate concerns wearing the same name — rate limits, monthly allotments, and billing meters — and conflating them produces pricing pages that are confusing for customers and infrastructure that is brittle for operators.

This post covers the three concerns and why they need to live in different parts of the system, the metering schema that survives audits, the free-tier design problem of distinguishing useful evaluation from infrastructure abuse, the overage billing question and why most teams get it backward, and the operational discipline of grandfather clauses when pricing changes.

Three concerns, three implementations

Rate limits protect infrastructure from sudden traffic. They are short-window counters — usually one second to one minute — implemented in memory or in a fast key-value store. The right algorithm is sliding-window counter, and the right enforcement point is at the API gateway before any application code runs. Customers do not pay rate limits; they live with them as a property of the API. Rate limits should be uniform across tiers, possibly higher for paid tiers, but the differences should be small. A rate limit that is dramatically lower on the free tier is a signal of confused pricing, not generous infrastructure.

Quotas are monthly allotments. They count usage against a billing-period budget and serve a pricing function. They are persisted in the database, billable to customer state, and survive restarts. The right enforcement point is at the application layer where the customer is identified and the billing relationship is known. Quotas reset on a billing cycle boundary, which is per-customer rather than calendar-month. Quotas typically have soft and hard limits — warn at 80%, throttle at 100%, block at 120% with a paywall.

Billing meters are the source of truth for invoicing. They are append-only event logs that record metered usage with enough fidelity to reconstruct any monthly bill, defend against disputes, and run usage analytics. The retention requirement is years, not months. Billing meters should never be the same data structure as quota counters — quotas can be aggregations of meter events, but the meter is the canonical record.

The metering schema

The metering schema we have used across DocuMint, FlagBit, and WebhookVault is a single table per metric: (customer_id, metric, ts, quantity, request_id, metadata). The composite index (customer_id, metric, ts) serves both quota lookups and billing-period aggregations. The request_id column allows idempotent inserts — duplicate writes from retries collapse to a single billable event. The metadata JSON column captures whatever else might be needed for analytics or dispute resolution: API endpoint, response status, payload size.

The append-only discipline is non-negotiable. Meter events are never updated, never deleted, never overwritten. If a charge needs to be refunded, the refund is itself an event with negative quantity. If a meter event was recorded in error, a correction event is appended. The audit trail is the table, and any operation that would break the audit trail is forbidden by application invariant.

Quota enforcement reads the meter table with a SUM aggregation over the current billing period. For high-volume APIs this is too expensive to do per-request, so a separate quota_state table holds the running aggregate per customer with a periodic reconciliation job that re-derives it from the meter table. The reconciliation job catches drift, the quota_state table serves the hot path, and the meter table is the source of truth.

Free-tier design

The free-tier design problem is the central pricing decision. A free tier that is too generous attracts users who will never pay; a free tier that is too restrictive cannot be evaluated. The honest framing is that the free tier is an evaluation tool, not a product. Its purpose is to let an integrator validate the API for their use case, not to serve their production traffic.

The DocuMint free tier of 10 invoices per month is calibrated to this framing — enough to validate the integration end-to-end, not enough to serve any meaningful business. The CronPing free tier of 5 monitors is similar: enough to set up monitoring for a personal project, not enough for production infrastructure. The pricing-page case for upgrading is implicit in the free-tier limits: the user who has integrated the API and exceeded the free tier has already done the integration work, and the marginal cost of upgrading is low compared to the cost of switching to a different vendor.

The free-tier abuse pattern to watch for is fan-out — a single user creating many accounts to multiply free-tier capacity. The mitigations are credit-card requirement, email verification, IP-based clustering, and behavioral fingerprinting. The credit-card requirement is the strongest filter and the one with the highest legitimate-user friction. The right choice depends on the abuse-vs-conversion trade-off and is one of the few decisions that has to be re-evaluated when the abuse pattern changes.

Overage billing

Overage billing is the question of what happens when a customer exceeds their plan quota. The three options are hard cap, soft cap with overage charges, and soft cap with an overage tier. Most teams default to hard cap — block requests, return 429 with an upgrade prompt. This is the simplest implementation and the worst customer experience. A customer in production who hits the cap during a traffic spike loses business until they upgrade, which they cannot do at 3am during the spike.

The soft cap with overage charges — keep serving traffic, charge per-unit overage on the next invoice — is the customer-friendly default and the one we have moved toward across all four products. The risk is bill-shock from runaway usage, mitigated by alerts at 80% and 100% of plan, daily-cap caps that prevent unlimited overage, and email notification of overage charges before they appear on the invoice. The implementation is straightforward when the metering schema is right: overage is just a different price applied to events above the plan threshold.

The soft cap with overage tier is a hybrid: the customer is automatically upgraded to the next tier when they exceed their plan for two billing periods in a row. This handles the case where a customer's actual usage has outgrown their plan and the overage charges are signaling that the plan is wrong. The implementation requires the metering layer to track usage at month-over-month granularity and is genuinely more complex; we have not yet implemented it.

Grandfather clauses

The pricing change is the operationally hardest moment in a SaaS business. Existing customers were charged X for Y; the new pricing charges X' for Y' that may be a different shape. The discipline that prevents this from being chaotic is grandfather clauses: existing customers continue on their original plan until they choose to switch, and the new pricing applies only to new signups and explicit migrations.

The implementation requires the customer record to carry the plan version they signed up under, with the pricing engine evaluating quotas and rates against that historical plan rather than the current one. The complexity compounds linearly with the number of plan versions, and the discipline of retiring old plans — by reaching out to customers on plan v1 with a migration offer six months before sunsetting it — is the only way to keep the plan-version count manageable. The Stripe data model supports this through subscription products and price IDs, and the right enforcement point is in the pricing-engine module rather than scattered through the application.

Quotas are the place where the business model and the infrastructure meet. The discipline is in keeping the three concerns separate, in treating the metering layer as a database problem rather than a counter problem, in calibrating the free tier to evaluation rather than production use, in defaulting to soft caps with overage charges rather than hard caps, and in maintaining grandfather clauses through pricing changes. None of this is exciting. All of it compounds, and getting it right early is much cheaper than fixing it later when the meter table has grown to billions of rows and the legacy plan versions number in the dozens.

Across the four Anethoth products, this is implemented uniformly: DocuMint meters PDF generations, CronPing meters monitor pings, FlagBit meters flag evaluations, and WebhookVault meters webhook captures. The same schema, the same separation of concerns, and the same grandfather discipline across all four. Boring is the highest praise pricing infrastructure can receive.

Designing API Quotas: Metering, Billing, and the Trade-offs of Free Tiers

Anethoth

Three concerns, three implementations

The metering schema

Free-tier design

Overage billing

Grandfather clauses

Read more

How Treehoppers Communicate Through Plant Stems: The Strange Substrate-Borne Vibrational Network

The Forgotten History of the Microwave Oven: How Radar Engineering Reshaped the Kitchen

Postgres pg_settings: Reading and Reasoning About Configuration at Runtime

Designing API Webhook Payloads: Snapshots vs References and the Right Default for B2B SaaS