DGA Families with Dynamic Seeds: Unexpected Behavior in DNS Traffic

Akamai Wave Blue

Written by

Connor Faulkner and Stijn Tilborghs

September 06, 2023

Connor Faulkner

Written by

Connor Faulkner

Connor Faulkner has a background in astrophysics and is driven by curiosity and a passion for deciphering complex systems. He is a dedicated Data Analyst in the Akamai Security Intelligence Group that explores the intricate landscape of threat detection.

Headshot of Stijn Tilborghs

Written by

Stijn Tilborghs

Stijn Tilborghs is an electronics engineer who decided to move into data science in 2016. His first months of income in the field came from competing for prize money in machine learning hackathons. After working as a freelancer for a few years, he is now part of the Akamai threat research team and seeks innovative solutions to the global and dynamic threat landscape.

A closer look at the Pushdo and Necurs DGA families reveals that they output malicious domains both before and after their expected generation dates.

Editorial and additional commentary by Tricia Howard and Lance Rhodes 

Executive summary

  • Akamai researchers reveal and explain why, in Domain Name System (DNS) traffic data, we observe behavior from dynamically seeded domain generation algorithm (DGA) families that is different from what their reverse engineered algorithm seems to suggest.

  • The modified behavior suggests that malicious actors are attempting to further increase the DGA families’ capability to extend the lifespan of their command and control (C2) communication channels, thus protecting their botnets.

  • Security researchers find it more complex to predict the future-generated domain names for dynamically seeded DGAs than for statically seeded DGAs. 

  • A closer look at the Pushdo and Necurs DGA families reveals that they output malicious domains both before and after their expected generation dates.

Introduction

In this blog post, we will provide a brief overview of DGAs, and then we’ll share some interesting findings.

The Akamai Security Intelligence Group is able to analyze anonymized logs of DNS queries originating from CacheServe DNS servers. As part of our botnet detection efforts, we observe and monitor the real-world behavior of more than 100 known DGA families. 

We found that dynamically seeded DGAs (a subset of DGAs) often display very different behavior than the reverse engineered DGA algorithm itself seems to suggest. More precisely, we see DGA domain names being activated before their expected generation date. 

What are domain generation algorithms? 

Malware, such as botnets, often need to communicate with a centralized server to receive commands or updates. 

DGAs are algorithms used in malware to generate large numbers of semirandom domain names.

An infected device will regularly attempt to connect to the entire set of algorithmically generated domains provided by the DGA. Only one domain has to be successfully reached to establish a connection with the C2 server. This makes it harder for cybersecurity researchers to take down the C2 communication.

How it works

For example, imagine a botnet using a hypothetical DGA family or variant that generates 500 malicious domain names per day.

An infected device using this DGA family will query all 500 of these domain names each day. The C2 server of the botnet will generate the same 500 domain names on each day (we assume the same seed is being used — more on that later). However, the malicious actor only needs to control 1 of those 500 domains for the communication with the infected machines (bots) to be established.

Sometimes the seed changes, which generates a new set of domains, and the process begins anew. This makes it difficult for security researchers to block the malicious traffic as the domains change frequently and are often random-looking domains, such as “ghlidae[.]com”.

The top-level domains (TLDs) are hard-coded and mostly limited to TLDs that are cheap to acquire.

There are many different DGAs in existence. Once the security community discovers a new algorithm (and sometimes manages to reverse engineer it), it is typically given a “family name.” Some of the most well-known DGA families are Conficker, Mirai, and CryptoLocker.

The history of DGAs 

Malware like botnets, crimeware, and ransomware need to communicate with their infected devices. Before DGAs came into existence, the malware authors simply hard-coded the domain, or a list of domains, into the malware code. Infected machines would then regularly try to connect to these hard-coded domains to establish communication with the C2 server.

Once security teams got a hold of the malware’s source code, it was a simple task to put all of those hard-coded domains onto a blocklist.

Researchers: 1. Bad guys: 0.

The first malware family to implement DGAs was the Kraken family in early 2008. However, later that year, the Conficker family would make DGAs popular.

Conficker.A generated 250 domain names per day. Conficker.C then one-upped and spit out a massive 50,000 domains per day. This resulted in security teams suddenly having to detect and block hordes of new domains each day. The malicious actors, on the other hand, needed to still control just a single one of those domain names each day.

Researchers: 1. Bad guys: 1.

Increase robust C2 communications

DGAs made it possible to increase the robustness of C2 communication, enabling further development of: 

  • Distributed denial-of-service (DDoS) attacks

  • Cryptomining

  • Selling sensitive information from compromised devices

  • Spyware

  • Advertising and email fraud

  • Self-spreading of malware

These are some of the campaigns that continue to plague the cybersecurity community to this day. DGAs have been proven very effective.

What are dynamic seeds and static seeds?

There are two main categories of DGAs: dynamic seeds and static seeds. To understand the difference, we must first understand the “seed” concept.

The seed is essentially a starting input for a pseudorandom number generator (PRNG). The seed has a direct impact on the output of any algorithm that uses a PRNG.

For example, a specific DGA family using a seed of 42 will always output the exact same list of domain names. Changing the seed to something else, such as 50, will lead to a completely different output.

As you can imagine, the seed plays a critical role for DGAs. Infected botnet devices not only need to use the same DGA algorithm as the C2 servers they need to contact, but also need to use the same seed.

DGA seeds can be generated in various ways and on the basis of various sources

When the DGA seeds do not change over time (often hardcoded), we call them statically seeded DGAs.

Some DGAs use seeds that change over time. We call these dynamically seeded DGAs.

Statically seeded DGAs

Static seeds can be random numbers, celebrity names, the Declaration of Independence, a dictionary of words, or anything that a malicious actor can swap for something else with ease.

These seeds typically remain constant for a long period and generate a consistent sequence of domain names.

These DGA and seed combinations stay effective only as long as the algorithm is not reverse engineered and the seed is not discovered by cybersecurity researchers. When that point is reached, all the generated domain names will be rapidly put on blocklists. The malicious actor will then have to change the seed to generate a new list of domain names.

Internally, we refer to statically seeded DGAs as simply “static DGAs” and we will use this term for the rest of this post.

Dynamically seeded DGAs

Dynamically seeded DGAs (or simply “dynamic DGAs”) attempt to further complicate the life of security researchers.

Dynamic DGAs use time-dependent seeds. The current date is most commonly used. There are also DGAs using FX rates, temperatures, and even Google Trends or Twitter trending topics.

When the seed is predictable, we security researchers can predict which domain names the DGA will produce at a certain time in the future. The condition is, of course, that the DGA family has been successfully reverse engineered. 

If the seed is based on the date, we generally see the same set of domain names in 24-hour windows (i.e., each day, just after midnight, a new set of domains is generated).

Knowing which DGA domains will activate tomorrow allows us to proactively put these domains on our blocklists to protect end users from botnets.

Unfortunately, that scenario isn’t possible with unpredictable seeds, such as Google Trends, temperatures, or foreign exchange rates. Even if we have the source code of the family, we are not able to correctly predict future-generated DGA domain names.

Dynamic DGAs: Expectation vs. reality

Our research team has observed and investigated unexpected behavior for more than a dozen of DGAs. Let’s look at the behaviors of two especially interesting ones.

Both examples are dynamic DGA families that use the date as seed. This means that by combining the seed (the date) with the reverse engineered DGA, we should be able to predict which domain names will appear, and when, in DNS query logs.

We will compare our predictions with what we actually saw in the DNS traffic data.

For the sake of brevity, throughout the rest of this section we will simply use “DGA” or “DGA family” as shorthand for “dynamic DGA families that use the date as the seed.”

Unique number of domains seen in traffic Fig. 1: A generalized view of DGAs in traffic

A view of DGAs in traffic data

Figure 1 gives us a generalized view of the DGAs in our traffic data. To properly convey the intuition behind this, we need a little bit of context.

First, let's define the axes.

  • The x-axis represents the time difference (measured in number of days) between the expected date (the seed date) and the observed date that we see domain names from the DGA family in DNS traffic data.

  • The y-axis is the unique number of the domains seen in traffic data.

We expect the seed to change every 24 hours; that is, every day, just past midnight, the DGA will activate a new set of domain names from the new seed. That would suggest that we have a window of the same set of domain names for 24 hours and then it changes. This is represented by the red plot. The red bar shows us what we expect to see from these DGA families in an ideal world without latency.

To the right, represented by the purple bar, is what we expect to see when we account for latency at various stages, before the DNS data reaches our systems. Most latencies will only cause a slight shift to the right, typically measured in minutes or hours rather than days, unless it’s by design.

To the left, however, is something unexpected, represented by the green bar. What is going on here? We observe the DGA domain names before their theoretical generation dates!

This odd behavior suggests that the malicious actors have modified these DGAs to further complicate detection and protect their malicious activities.

Pushdo family traffic

Unique number of domains seen in traffic Fig. 2: Pushdo malware family

For the Pushdo family, we expect to see all the queried domains in a 24-hour window between 0 and 1 days on the x-axis (Figure 2). This is represented by the red shaded area.

What we actually observe is a distribution of unique domain names in traffic from −50 to +50 days from its expected date. The peak lies at 10,000, just before the zero mark.

It looks like the seed (the date) was shifted up to 50 days through something that looks like a normal distribution.

Python code for this could look something like:

  import numpy as np
  import pandas as pd
  from datetime import datetime

seed = datetime.now().date()
shift = np.random.normal(loc=0, scale=15, size=1).astype(int)[0]
modified_seed = seed + pd.to_timedelta(f'{shift} days')

We interpret this as an attempt from a malicious actor to frustrate or confuse security researchers.

Luckily, it doesn’t confuse us!  Our DGA detection systems cover the entire spectrum visible in Figure 2.

Necurs family traffic

Unique number of domains seen in traffic Fig. 3: Necurs malware family

For the Necurs family, we see a distribution of unique domain names from −7 to +7 days (Figure 3). There is also a much smaller spike around the +12-day mark, but it’s large enough to be considered the product of design.

This suggests that a subset of malicious actors are waiting long enough to use the same set of domains, but shifting its use until after 7 days of its expected date, lagging domain names by those 7 days.

Conclusion

When analyzing the activity of dynamically seeded DGAs in DNS requests, we observed some unexpected behavior. We conclude that these anomalies can be attributed to malicious actors modifying the DGA seeds in various ways. Both of the DGA families that we’ve examined, Pushdo and Necurs, output malicious domains both before and after their expected generation date, ranging as far as 50 days before and after the expected generation date.

Our analysis suggests this is being done as an attempt to avoid DGA detection systems and complicate the work of security research teams. While malicious actors continue to search for ways to protect their botnets and extend the lifespan of their C2 communication channels, it is the job of security researchers to counter these measures and better identify what is real versus what is expected.

Stay tuned

You can find our breaking security research in real time by following us on Twitter.



Akamai Wave Blue

Written by

Connor Faulkner and Stijn Tilborghs

September 06, 2023

Connor Faulkner

Written by

Connor Faulkner

Connor Faulkner has a background in astrophysics and is driven by curiosity and a passion for deciphering complex systems. He is a dedicated Data Analyst in the Akamai Security Intelligence Group that explores the intricate landscape of threat detection.

Headshot of Stijn Tilborghs

Written by

Stijn Tilborghs

Stijn Tilborghs is an electronics engineer who decided to move into data science in 2016. His first months of income in the field came from competing for prize money in machine learning hackathons. After working as a freelancer for a few years, he is now part of the Akamai threat research team and seeks innovative solutions to the global and dynamic threat landscape.