You set a DNS TTL of 60 seconds because you want fast failover. Something goes wrong, you update the DNS record, and you wait. Five minutes later your monitoring still shows traffic hitting the old IP. Twenty minutes after that, things start clearing up. Your TTL was 60 seconds.
Here is why this happens and where each caching layer is ignoring the number you set.
TTL is advisory, not mandatory
RFC 1035 defines TTL as "the time interval that the resource record may be cached before the source of the information should again be consulted." May be cached. The TTL is a maximum, not a binding contract. Resolvers are free to cache for less time (they sometimes do, especially under memory pressure) and are technically free to cache for longer if they choose to, which they also sometimes do.
The practical implication: your TTL controls an upper bound on propagation speed under ideal conditions. Real conditions are not ideal.
Browser DNS cache floors
Chrome caches DNS results for a minimum of approximately 60 seconds regardless of the TTL you set. If your TTL is 30 seconds, Chrome will still hold the record for roughly a minute before re-querying. The maximum Chrome will cache a positive result is 5 minutes, also regardless of TTL. Firefox behaves similarly. Safari's behavior varies by version.
This means that for any user who has already visited your site, changing DNS has no effect for at least 60 seconds — and if they're in an active session with connection reuse, the old IP may stay in use for the duration of that session.
# Chrome's internal DNS cache can be inspected at:
# chrome://net-internals/#dns
# Flush it with the "Clear host cache" button — useful when debuggingOS resolver caching
Between the browser and the recursive resolver sits the OS's stub resolver, and it has its own cache. On Linux systems running systemd, systemd-resolved caches DNS responses with TTL-respecting behavior — but also enforces a minimum TTL of 20 seconds for successful responses, regardless of what the authoritative server sent.
macOS has its own DNS cache. Windows has the DNS Client service with a configurable minimum TTL that defaults to 0 but is often set higher by enterprise configuration. Many corporate environments run a local dnsmasq instance that adds another caching layer with its own floor.
The point: before your query even reaches the internet, it may be answered from an OS-level cache with its own minimum retention period. The stub resolver is invisible to you and adds latency you cannot directly control.
Recursive resolver minimums
The recursive resolvers that most of your users are pointing at — primarily Cloudflare's 1.1.1.1 and Google's 8.8.8.8 — enforce minimum TTL floors for practical reasons. Cloudflare enforces a minimum of approximately 30 seconds on cached responses. Google enforces a similar floor. ISP-operated resolvers vary widely, and some enforce minimums of 5 minutes or more because they prioritize cache efficiency over TTL precision.
The consequence: if you set a TTL of 10 seconds to enable rapid failover, your recursive resolver is likely caching for 30+ seconds anyway. Your 10-second TTL doesn't propagate the way you expect.
CDN edge caching
If your traffic routes through a CDN, add another layer. CDN edge nodes resolve DNS themselves when establishing connections to your origin. That resolved IP is cached at the CDN layer with its own TTL — which may or may not respect your DNS TTL. Cloudflare, Fastly, and others maintain their own DNS resolution caches at edge PoPs, and these caches are designed for performance, not instant TTL compliance.
When you update a DNS record, the CDN edge may continue routing to the old origin for minutes after the TTL has expired, depending on how recently its local cache was populated and whether it has any mechanism to flush it.
Practical propagation is 5–30 minutes, not your TTL
Stacking these layers:
- Browser cache: 60–300 seconds
- OS stub resolver: 20+ seconds
- Recursive resolver: 30–300+ seconds
- CDN edge: variable, often 60–600 seconds
For a change affecting users who have recently resolved your domain, real-world propagation in the best case is 5 minutes and in typical cases is 20–30 minutes. The enterprise and ISP resolver tail extends this further — some caches will hold your old record for an hour regardless of your TTL.
This is not a bug. It's a deliberate design trade-off: aggressive caching reduces recursive resolver load dramatically. The DNS infrastructure was not designed around the assumption that operators would need sub-minute propagation for failover.
What to do
Lower TTL in advance, not at incident time. If you know a change is coming, set your TTL to 60–120 seconds a day before. Once the old TTL has expired everywhere, your caches are holding the record with the shorter duration. When you make the change, propagation is constrained to your new low TTL rather than whatever you were running previously (often 3600 seconds).
Use health-check-based failover at the DNS provider level. Route53, Cloudflare, and other managed DNS providers offer health-check-integrated DNS, where they will automatically switch the record to a backup IP if the primary fails a health check. This happens at the authoritative server level and propagates to recursive resolvers as a normal TTL-governed update — but you get a head start because the provider changes the record immediately when the check fails rather than waiting for you to notice and manually update. Combined with a low TTL maintained in advance, this is the practical approach to DNS-based failover.
For critical failover, don't rely on DNS.** DNS-based failover is slow. If you need sub-minute failover, use load balancer health checks, anycast routing, or application-layer failover that doesn't require DNS propagation. DNS is a good signal for where to route traffic over minutes, not seconds.
---
Find more writing at anethoth.com. We're building builds.anethoth.com — a directory for indie SaaS projects with transparent revenue.