Smart DNS: Delivering the Best Subscriber Experience
This is the second in a series of blog posts that discuss how smart Domain Name System (DNS) resolvers can enhance ongoing network transformation efforts such as the transition to 5G, better integration of Wi-Fi, and new network designs that optimize the edge to improve the subscriber experience, service delivery, and network efficiency.
The presence of public "over-the-top" DNS resolution alternatives is a strong motivator for internet service providers (ISPs) to invest in making their DNS resolution infrastructure the best that it can be. Resolvers are the glue that binds subscribers to their fixed and mobile broadband services. Operators of public DNS services will play a significant role in controlling the user experience and gain goodwill if they succeed in persuading subscribers to use their resolvers. Worse, when public DNS services fail, it's probable that subscribers will blame their service provider because they may not understand the critical role DNS plays, or may not even remember they switched their DNS settings!
A subscriber's internet experience is closely tied to the performance of the DNS resolution infrastructure. Several factors affect DNS load, which in turn may affect user experience:
Web pages are becoming more DNS-intensive because they composite content from many different sources, each requiring one or more resolutions
Web browsers have implemented prefetch features that issue DNS queries for links on web pages and cache the results so they are available immediately if a user clicks
Web resources increasingly use shorter TTLs to ensure the best possible server is chosen at any given moment; shorter TTLs also mean more frequent refreshes of resolver caches, which result in more recursive queries and thus more load
Subscribers have phones, tablets, and PCs they engage with intensively, generating large volumes of DNS traffic; even cameras, digital assistants, and other intelligent "things" connected in home and small business networks generate queries
Akamai has supplied resolution infrastructure to ISPs and mobile network operators (MNOs) for nearly 20 years, and watched as resolution traffic has increased, on average, about 20% per year. In today's fixed networks, a subscriber account can generate 10,000 queries per day, and mobile devices source around 1,000 queries per day. Providers need to design their networks to account for escalating demands on resolvers.
DNS resolvers need to be situated at the network edge as close to subscribers as possible to minimize transit delay in the network, which is a contributing factor to DNS latency that shapes a user's overall internet experience. Resolvers can be situated to align subscriber density and resolver performance (queries per second) to right-size capacity. Distributing resolvers at the network edge has other nice properties. More points of presence (PoPs), with each serving fewer subscribers, means the impact of failures is reduced -- and subscribers will failover to a PoP that is closer with lower latency than a secondary, distant, centralized PoP.
It's also important to validate the resolution network design to see what users will actually experience. The most important thing to test is recursion. Typically a resolver will answer about 90% of queries from the cache, since under real-world conditions entries age out and queries come in requesting names that aren't cached. The cache hit percentage can vary, so it's worth testing a range of cache hit rates to see how a resolver will behave under the wide range of operating conditions it will encounter in production. Testing high rates of recursion is also useful to see how well the resolver stands up to attacks, or unusual query volumes or patterns.
Provider resolvers can also be exposed to a variety of attacks that undermine the subscriber experience by degrading network performance or compromising security and privacy. When they discover new avenues to exploit attackers, like targeting the DNS because it offers great return on investment (ROI), they get high impact with little effort. This means properly securing resolvers is a fundamental network best practice.
Over the years, several techniques have been developed to manage DNS distributed denial-of-service (DDoS) resources. It's simple for attackers to instruct bots to launch volumetric attacks using queries that generate massive volumes of response traffic (DNS amplification), or use randomized domain names, or more recently nameserver names, to stress resolvers' processors and memory by increasing the recursive load.
In the 30+ years since the DNS was conceived, numerous methods for corrupting DNS cache entries have also been discovered that allow attackers to surreptitiously redirect web traffic, creating significant security exposure and compromising privacy. When new methods of cache poisoning are uncovered, it usually sets off an urgent round of patching to preserve the integrity of the internet.
DNS security belongs in DNS servers! Resolvers need to have built-in protections against DNS DDoS and resilience against spikes in traffic. Network design can also influence availability under attack scenarios. To deter cache poisoning attacks, resolvers equipped with layers of defenses, working together as part of a well-tuned software system, can dramatically improve protection.
DNS Security Extensions (DNSSEC) validation offers definitive protection against cache poisoning, especially since signing of domains has become far more prevalent. It's worth investigating all of the cache poisoning defenses vendors have implemented to protect unsigned domains. Widely implemented mechanisms like User Datagram Protocol (UDP) source port randomization are table stakes and have limitations. More sophisticated capabilities can intelligently screen answers to recursive queries and discard potentially malicious data in responses. It's inevitable that new cache poisoning attacks will emerge, and a proactive response will always be better than a reactive one.
Vulnerabilities in DNS code have been publicized for many years, with dramatic differences among different code bases. It's worthwhile to assess the history of a particular code base to gauge potential exposure. It will also offer a sense of the ongoing effort to patch, and associated stress on operations teams and processes.
Finally, in a recent development, new standards -- DNS over Transport Layer Security (TLS) and DNS over HTTPS -- define how transactions between clients (stub resolvers) and resolvers are encrypted. This will add yet another dimension to the DNS resolution picture that will be covered in a future post in this series.