Introduction to DNS Data Exfiltration
After the initial publication of this blog post, Asaf Nadler and Avi Aminov wrote a paper on the detection of malicious and low-throughput data exfiltration over the Domain Name System (DNS) protocol. The DNS protocol is a naming system for host machines and an essential component in the functionality of the internet. The vast number of domains and subdomains on the internet today exceeds the storage capabilities of a small, simple database. This was foreseen by the designers of the DNS, and the system was designed as a hierarchical distributed database. The resolution of a domain name to the IP address of its host machine starts by querying the root DNS servers (i.e., the head of the hierarchy) in a top-down manner until reaching a designated server called an authoritative name server (AuthNS). With so much information to share, the authors ended up being published in Elsevier. They are thrilled to be published in such a prestigious journal, and invite you to read both the paper and this related blog post.
Spyware is a malicious software (malware) used to gather information about a person or organization without consent. In a typical setting, a remote server, that acts as a command and control server (C&C), waits for an incoming connection from the spyware that contains the gathered information. Statistics reported by Avast estimate that nowadays over 100 million types of spyware are active worldwide.
In the presence of network security products (e.g., firewalls, secure web gateways, and antiviruses), spyware must communicate with its C&C server over a covert channel, to prolong its operation. Among commonly used covert channels, the Domain Name System (DNS) protocol stands out.
Data exchange over the DNS protocol
The DNS protocol is a core component of the internet protocol (IP) suite; its main goal is the translation of hostnames to IP addresses. The growing number of domains in the internet today exceeds the storage capabilities of a single database server, thus the DNS protocol was designed as a distributed database. Each hostname resolution corresponds to a single server within the distributed database, also known as an authoritative name server (AuthNS). Upon a request for a hostname resolution, a DNS client iterates over authoritative name servers until it reaches the correct one. Once the correct AuthNS is reached, it replies with an answer corresponding to the appropriate hostname. You can think of the hostname as incoming data for the AuthNS as displayed in Figure 1 (e.g, when requesting for "passw0rd.exfiltration.com", the AuthNS for "exfiltration.com" acquired the input "passw0rd").
As a channel for the exchange of data, the DNS protocol is far from optimal with regards to efficiency and reliability. The DNS protocol restricts queries (i.e. outbound messages) to 255 bytes of letters, digits, and hyphens. Also, since the DNS protocol is used mostly over the User Datagram Protocol (UDP), there is no guarantee that queries will be replied based on their order of arrival.
Nevertheless, from a security standpoint, the DNS protocol is an excellent covert channel. Due to its crucial internet role, misconfiguration of the DNS can lead to network disconnects, and it is therefore rarely restricted with security policies (e.g., allowing resolutions only to specific domain names). In addition, the DNS protocol is often less monitored in comparison to other internet protocols (e.g., HTTP, FTP, and mail transfer protocols) for posing a lesser risk. Use of the DNS protocol as a covert channel has been a part of previous cyber campaigns, including: the theft of 56 million credit and debit card numbers from Home Depot in 2014 and 25,000 credit cards stolen from Sally Beauty.
During the last decade, several open-source software programs, as well as spyware, made use of the DNS protocol for data exchange. While the scheme for data exchange (as described before) remains the same, the communication pattern of the protocol varies. As a result, the detection techniques change as well. In the next sections, we introduce two classes of data exchange over the DNS protocols: (1) high throughput DNS tunneling and (2) low throughput exfiltration malware as well as review existing techniques for their detection.
High throughput DNS tunneling
High throughput DNS tunneling (DNS tunneling) is a family of freely available software for data exchange over the DNS protocol. The DNS tunneling family includes software such as: Iodine, Dns2tcp, and DNSCat. Most of these are general purpose, thus allowing various types of data exchange (e.g., web browsing, file transfer, and remote desktop control).
Although a commonly known and non-malicious use of DNS tunneling is bypassing Wi-Fi payment by setting up a DNS tunnel for web browsing, it may also be used as a communication channel between a malware and its C&C server. Therefore, there is a clear motivation for the security community to detect DNS tunneling.
In order to further discuss the detection of DNS tunneling, its unique characteristics should first be addressed. Because the DNS protocol is based mostly over UDP, there's no guarantee for the arrival of messages in the order in which they were sent. This is handled by DNS tunneling tools by either enforcing a TCP communication over the DNS, or sending constant ping messages between requests to assure the correct order. Applying these methods for the sake of integrity, increases the rate of messages over the DNS protocol. Also, when a DNS tunneling tool is used for either web browsing or file transfer, the volume and length of messages will increase as well in comparison to normal DNS traffic behavior.
Due to the latter, we expect the presence of DNS tunneling to cause a significant change of the DNS traffic with regards to: (1) volume, (2) messages length, and (3) a shorter mean time between messages (see Figure 2).
Based on this distinguishable behavior, current solutions focus on the detection of DNS tunneling by relying on the volume and variety of requests that these tools generate. The obvious solution is rate control, which is offered by security vendors. Other more sophisticated solutions rely on statistical models. Among such models are: supervised learning models trained on tunneling versus non-tunneling user traffic, and anomaly detection models that will trigger upon a significant change over the DNS traffic as whole. Such models prove themselves highly effective with regards to recall (i.e., rate of detection) and false positive rates. While the problem of DNS tunneling detection is important, and has been studied thoroughly, an entire class of low throughput DNS exfiltration malware remained overlooked. This class, containing at least nine malware over the last seven years (see Figure 3) is discussed next.
Low throughput DNS exfiltration malware
In the case of malware, data exchange over the DNS may avoid a TCP tunnel and constant pings. Instead, short and sporadic messages can be delivered on rare occasions. For example: a malware "wakes up" once an hour and sends a poll message to its C&C in which it’s asking for instructions, or instead a malware detects a credit card swipe and sends it to the C&C without waiting for a response.
The malware shown below (Figure 3) has a short and sporadic message exchange policy designed to avoid DNS tunneling detection solutions relying on a high message volume, lengthy queries, and density.
Malware |
First Seen On |
2017 |
|
2016 |
|
2015 |
|
JAKU / C3PRO-RACOON |
2015 |
BerhnardPOS |
2015 |
2014 |
|
PlugX |
2014 |
FeederBot |
2011 |
2011 |
Figure 3 - List of low throughput DNS exfiltration malware
To the best of our knowledge, the problem of detecting low throughput DNS exfiltration malware had not been studied. In an upcoming blog post, we will elaborate further on the matter and unveil a novel solution aimed at the challenging case of such malware.