Need cloud computing? Get started now

Akamai Develops Real-Time Detections for DNS Exfiltration

Yarin Ozery

Written by

Yarin Ozery

August 17, 2023

Yarin Ozery

Written by

Yarin Ozery

Yarin Ozery is a Security Researcher and Software Engineer at Akamai, working on innovative data-driven solutions to various security problems. His passion lies in developing end-to-end scalable high-performance security solutions.

Akamai’s innovative DNS exfiltration algorithm is implemented in our DNS security solutions to deliver real-time detections for previously unseen exfiltration malware.

Internet operations rely heavily on the Domain Name System (DNS) protocol, which is used to translate memorable domain names into IP addresses. Many services count on DNS, making it one of the most important and fundamental internet protocols.

Enterprise and government IT departments use various defense mechanisms — like installing antivirus software on internal machines, deploying firewalls, and monitoring network traffic flows — to protect users from malicious actors. However, since the DNS protocol is integral to giving users internet access, it’s often poorly monitored and left unblocked.

Malicious actors target enterprises and government organizations for data theft using various methods. As mentioned, the DNS protocol is highly vulnerable and exploitable, so attackers use it to exchange data through a covert communication channel among themselves  — and/or command and control (C2) servers — and compromised hosts. This practice is known as DNS exfiltration.

How DNS exfiltration works

DNS exfiltration appends data to be exfiltrated to DNS queries and responses. This process includes five high-level steps.

  1. Data encoding. The data to be exfiltrated is encoded or broken down into smaller chunks — typically binary- or Base64-encoded — to fit into DNS query or response messages. This ensures that the data can be transmitted through the limited space available in DNS queries.

  2. DNS tunneling. The attacker's system or malware establishes a covert communication channel between a controlled DNS server and the attacker’s system or malware. This server could be a legitimate, compromised server or a malicious DNS server established by the attacker.

  3. DNS requests. The attacker's system generates DNS requests containing the encoded data as part of the query or response fields. These requests are sent to the controlled DNS server, which acts as the exfiltrated data’s receiver.

  4. Data extraction. On the attacker's side, the controlled DNS server receives the DNS requests, extracts the encoded data from the queries or responses, and decodes it back to its original form.

  5. Data reconstruction. Once the attacker has received all the encoded data, they can reconstruct the original information. This could involve reassembling the data chunks or applying any necessary decryption algorithms.

The attacker may also use obfuscation techniques to blend exfiltration and  legitimate DNS traffic, making detection more challenging. Most significantly, the attackers normally use a low and slow approach to avoid detection — for example, exfiltration DNS queries may only occur once every 10 seconds.

What are DNS servers?

If you’re unsure about how DNS servers work, take a look at Figure 1.

How DNS servers work Fig. 1: DNS servers translate readable domain hostnames into IP addresses that machines can read

Identifying DNS exfiltration

Many enterprise organizations look for more advanced security tools that can identify and block DNS exfiltration attempts. Akamai continues to develop novel detection techniques based on our extensive DNS exfiltration research over the past six years. This paper details the extensive academic research that enabled the development of a new detection algorithm that enhances DNS exfiltration protection in our security products

Akamai Secure Internet Access Enterprise is a solution that detects DNS exfiltration attempts and issues customer alerts. In the vast majority of these events, the exfiltration is part of a penetration testing exercise, but the service also detects genuine DNS exfiltration incidents by cyberattackers.

Improving DNS exfiltration detection

To further improve our DNS exfiltration detection capabilities, Akamai’s data science team is developing a novel, real-time, information-based DNS exfiltration detection algorithm.

This new algorithm was inspired by the concept of Information-based Heavy Hitters (ibHH) in a DNS query stream. ibHHs are essentially domains associated with a large amount of unique information conveyed over queries’ subdomains. 

Our algorithm helps organizations estimate the amount of distinct information conveyed from users to registered domains over subdomains. When the amount of information conveyed to a domain exceeds a predefined threshold, the algorithm generates and reports an exfiltration alert.

Advancements to DNS exfiltration detection

Our algorithm offers significant advancements to modern DNS exfiltration detection. These include: 

  • Versatility. Akamai’s algorithm is extremely lightweight, so it can be used by small low-throughput networks and large high-throughput networks alike — with ibHH handling more than one million queries per second. By contrast, other options can only handle thousands of queries per second. This means the algorithm can be deployed directly to resource-constrained DNS resolvers, since the resolver’s performance won’t be impacted. 

  • Rigor. The algorithm has been evaluated against earlier detection methods using two of the largest DNS datasets in the industry, including approximately 50 billion queries from one week’s worth of Secure Internet Access Enterprise customer logs. 

  • Accessibility. The algorithm was written in open source Python so researchers can easily evaluate the performance of their own detection methods against our algorithm.

Unpacking the problem

DNS exfiltration is the act of abusing the DNS protocol to facilitate data exchange. The attacker registers a domain for which they configure an authoritative name server that they control (e.g., attacker.com). When the malware compromising the victim’s host attempts to exfiltrate sensitive private information to the attacker — like a client’s credit card details from a point-of-sale machine — it encodes that information and sends a DNS query request in a “<encoded_credit_card_details>.attacker.com” format.

The query propagates to the authoritative attacker.com name server controlled by the attacker — all resulting in a successful data exchange from malware to attacker.

Since DNS authoritative nameservers can send DNS query responses, the attacker can encode a message to the malware through the DNS response. This enables the DNS exfiltration tunnel to facilitate a bidirectional covert communication channel between the malware and the attacker for C2 communication, as in 2022’s B1txor20 botnet campaign.

Examples of data theft via DNS exfiltration

A famous case of data theft via DNS exfiltration was the 2014 FrameworkPOS campaign in which 56 million customers of the American retailer Home Depot had credit card information stolen via DNS exfiltration through point-of-sale systems over six months. Other notable DNS exfiltration cases include those by advanced persistent threat groups like OilRig and Operation Cobalt Kitty.

Illuminating exfiltration detection challenges

Detecting and preventing DNS exfiltration is difficult because of the DNS protocol’s ubiquity, the challenges of overblocking legitimate DNS traffic that can impact user productivity, and the prevalence of using the DNS protocol to establish bidirectional communication for legitimate purposes.

For example, DNS anti-malware services and some honeypot services use the DNS protocol and have similar DNS traffic patterns, which makes it hard to distinguish between malicious and benign DNS exfiltration events.

Timing and positioning of DNS exfiltration detection is also crucial, and can occur:

  • In real time on the DNS resolver as the DNS query stream processes

  • Later and offline as DNS queries are consolidated, aggregated, and analyzed

Real-time DNS exfiltration detection

Real-time exfiltration detection is preferable, as it enables faster identification and remediation. However, this approach is much more difficult, since real-time detection algorithms should be fast and memory-efficient to avoid impacting the recursive DNS resolver performance.

Many DNS exfiltration detection algorithms have been proposed in response to the volume of DNS exfiltration research conducted over the last decade. Despite the importance of real-time detection, past research typically focused solely on detection rate improvements. Earlier proposed detection methods were usually based on supervised or unsupervised machine learning algorithms. These existing design methods limit real-time detection capabilities because:

  • Nontrivial feature extraction could not be applied in real time.

  • Complicated models not designed for execution on the network perimeter required data collection in dedicated computation environments.

  • Supervised models are limited by their need for labeled datasets for benign and malicious DNS traffic, which is problematic because no high-quality publicly available DNS query dataset for DNS exfiltration exists. This means associated detection methods can’t accurately identify new and emerging DNS exfiltration malware.

Using ibHH for real-time DNS exfiltration detection

Akamai’s DNS exfiltration detection algorithm can operate in real time directly on Akamai’s carrier grade recursive DNS network. Our proposed approach — ibHH — is inspired by heavy hitter detection in streams, specifically distinct heavy hitter detection, and can process millions of queries per second. As such, it doesn’t impair the DNS resolver’s responsiveness or ability to handle DNS resolution.

Akamai has proposed the information heavy hitter idea to describe stream elements associated with a large amount of unique information. ibHH detection is based on how the amount of information conveyed from DNS query subdomains to their domains is quantified. Domains associated with large amounts of unique information are identified, and suspected DNS exfiltration domains are reported — all with sublinear memory consumption and performance time optimizations.

The ibHH input is a DNS query stream, such that for each DNS query (e.g., subdomain.example.com) we extract the domain and subdomain to obtain the pair (example.com, subdomain), where example.com is the query’s domain and subdomain is its subdomain.

ibHH consists of a fixed-size cache that stores, with high probability, the information heavy hitters in the stream. The cache size is an input parameter of the algorithm. ibHH also consists of ​​a random hash function (Hash~U[0,1]) that allows us to sample the distinct DNS query stream, detection threshold algorithm parameter, and threshold value τ (initialized to one) which is the probability of including a cache domain.

Finally, the detection threshold is inputted to ibHH, and when the information count exceeds this threshold, a new threshold is raised for the corresponding domain.

Figure 2 illustrates ibHH. An open source Python implementation of our algorithmic approach will soon be available from Akamai’s GitHub.

Figure 2 shows an overview of ibHH. Fig. 2: An overview of ibHH. When a compromised host performs a DNA query (1), the enterprise DNS gateway intercepts the query, estimates the amount of information it contains, and updates its internal information estimate cache (2). After the update, if the amount of information exceeds a predefined threshold, the query is blocked from reaching the attacker server (3) and an alert is raised to the enterprise’s security operations team (4).

Information quantification

To locate information heavy hitter domains from a DNS query stream, we must quantify the amount of unique information encoded in subdomains and accumulate this amount per domain. To do so, we define the subdomain’s information quantity as its length (e.g., N = |subdomain| is the information amount encoded in the subdomain).

Optimized counting with HLL++

To calculate the exact amount of unique information for each domain, we needed a set of associated subdomains that require linear space consumption. Since our solution runs on DNS resolvers with limited memory capabilities, we can’t use precise information counters — so we employ count-distinct approximation algorithms instead.

The count-distinct problem is well-studied, and many accurate and performant approximation algorithms exist to address it.

One such solution is HyperLogLog (HLL). Akamai uses a HLL variant called HyperLogLog++ (HLL++) that’s more accurate and uses less memory. We store and use an HLL++ instance for each cached domain to estimate the amount of information conveyed to the cached domain. Finally, once a cached domain’s information count exceeds the detection threshold, an alert is raised.

Better DNS exfiltration protection with Akamai

Akamai’s innovative DNS exfiltration algorithm is implemented in our DNS security solutions, like Secure Internet Access Enterprise, to deliver real-time DNS exfiltration detections for previously unseen exfiltration malware. Customers who use it in conjunction with our current exfiltration malware detection algorithm, have superior DNS exfiltration protection.

Learn more

Find out more about Akamai’s solution for identifying and analyzing DNS exfiltration.



Yarin Ozery

Written by

Yarin Ozery

August 17, 2023

Yarin Ozery

Written by

Yarin Ozery

Yarin Ozery is a Security Researcher and Software Engineer at Akamai, working on innovative data-driven solutions to various security problems. His passion lies in developing end-to-end scalable high-performance security solutions.