Risky Business: Determining Malicious Probabilities Through ASNs
by Will Rogers
Executive summary
- Using Akamai’s vast visibility into internet trends and activity, Akamai researchers have been analyzing autonomous system numbers (ASNs) to assess the risk of large swaths of the internet.
- ASNs represent the pools of IP addresses that are managed ISPs, cloud computing companies, and multinational conglomerates, as well as smaller organizations.
- Various characteristics of these ASNs — including its registered location, type of provider, and the policies that govern the usage of IPs within the ASN — can impact the probability of attackers being found using IPs within these ASNs.
- Understanding the maliciousness of an ASN allows security practitioners to make more-informed predictions about the risk of any given IP, even if it’s not a known threat.
- Malicious ASNs are more likely to contain IPs used to host phishing websites, malicious files, bots, and scanners. “Likely malicious” ASNs represent a 1 in 7 or higher probability of encountering a malicious IP.
- An analysis of traffic indicates that “likely malicious” ASNs make up fewer than 2% of all IPv4 addresses online, but they receive more than 5% of internet traffic.
- Furthermore, ASNs in the “potentially malicious” category make up fewer than 5% of all IPv4 addresses on the internet, yet they receive more than 18% of internet traffic, highlighting that malicious and legitimate traffic can be served by the same ASN.
Introduction
There are many uses for IP intelligence: firewalling or DNS post-query blocking, for example. The ability to preemptively determine “go / no go” on an IP-basis is a significant defensive measure.
However, as any security practitioner would tell you, the process of determining risk scores is a challenge. VirusTotal, Aegislab, and others are typical go-to services for beginning an investigation, but is there a better way to start?
Akamai Security Researchers leveraged Akamai's vast visibility into online traffic and created a comprehensive mapping of online ASNs in order to assess the probability of a malicious address appearing in an ASN. In this post, we will explore what ASNs are, and the impact ASNs can have on meaningful risk scoring. We’ll also examine malicious ASNs and more.
The state of IP risk scoring today
IP-based blocking can be a dangerous game because of dynamic IPs, CDNs, and hosting services. People and organizations can be directly affected with varying levels of severity. For instance, mistakenly blocking an IP to a social media site could simply be a nuisance to one employee, but could entirely impede a social media employee’s ability to do their job.
On an even grander scale, blocking a single IP from a CDN or a hosting service could result in blocking thousands of websites. We have protections in place in the form of allowlisting to avoid these extreme scenarios, but we also must remember that some use cases may prioritize broader threat protection over false positives. This describes the challenge with risk scoring in general: The level of nuance goes significantly further than allow/block.
- We are fortunate here at Akamai for the unique view of the internet we are able to leverage due to our CDN and security businesses. This allows us to combat the concerns mentioned above with a wide swath of data inputs, such as:
- Attacks by IPs on the Akamai Intelligent Edge Platform
- Detections derived from billions of DNS queries each day
- IP popularity derived from billions of DNS queries each day
- Other forms of internal and third-party intelligence
There are multiple sources containing both positive (legitimate IP) and negative (malicious IP) evidence for millions of IPs. We need to combine these sources to create a single score for each IP. For this, we adopt a Bayesian approach in which the complexity lies in determining the weight to apply to each source. What this does is create a more holistic risk score, factoring in the varying sources. Higher scores will have compounding levels of negative evidence. Lower scores will have less malicious evidence or a mix of positive and negative evidence. Depending on the application, a threshold can be chosen to achieve the required balance of true positives and false positives. This is going a step further than typical IP-based risk scoring, which is where understanding the power of an ASN and the impact it can have is imperative.
What is an autonomous system?
Autonomous systems are, in short, how the internet is constructed. Each autonomous system maps to a number, so colloquially we usually say ASN (for autonomous system number). Each ASN has a pool of IPs and it is each ASN’s responsibility to route traffic within its network and across the internet using the border gateway protocol (BGP) to communicate with other ASNs.
To visualize this, think of outer space. ASNs are the galaxies, and IPs are the stars within them. For this post, we are focusing entirely on IPv4, which includes approximately 70,000 autonomous systems.
ASNs can be classified in different ways to understand their place on the internet. For example, “Characterizing the Internet Hierarchy from Multiple Vantage Points” looks at BGP tables to classify the role of ASNs from a commercial perspective. “Unveiling the Type of Relationship Between Autonomous Systems Using Deep Learning” uses deep learning to understand the types of relationships between ASNs. To understand the organizations behind an ASN, we can look at “Revealing the Autonomous System Taxonomy: the Machine Learning Approach,” which tries to automatically classify ASNs into large ISPs, small ISPs, universities, internet exchange points, and network information centers. More recently, a word2Vec-style model was used in “BGP2Vec: Unveiling the Latent Characteristics of Autonomous Systems” to classify ASNs into the Center for Applied Internet Data Analysis categories of Transit/Access, Content, Education/Research, and Enterprise.
ASN size
Let’s look at the top 10 largest ASNs in terms of the size of their IPv4 address pool (Table 1). The largest ASN is owned by the United States Department of Defense Network Information Center. The remainder of the list is a mix of large ISPs, cloud computing companies, and multinational conglomerates. We exclude ASN 0 as it is reserved for the identification of nonroutable networks.
The impact of these 10 ASNs is shown further in Figure 1. These top 10 ASNs account for roughly 25% of the allocated IPv4 address space. In comparison, only 16% of the IPv4 address space is allocated to the 69,000 ASNs outside the top 1,000, so there is a long tail of ASNs similar to domains or IPs in internet traffic. That is a massive amount of the internet allocated to a relatively small percentage. This is part of the reason we have to factor in several data inputs into determining an effective ASN risk score. As you will learn later in this post, not all ASNs are created equal.
Geography
Location also plays a significant role in truly understanding ASNs. To showcase this, we have built a scatterplot with the number of ASNs associated with each country versus the IP pool size of those ASNs (Figure 2).
With six of the top 10 ASNs, the United States leads by a large majority in both total number of IPs, as well as total ASN count. The United States is followed by the other countries in the upper right corner: Brazil, Great Britain, Denmark, India, and Russia. Continuing to move a little more to the left, we can see that China, Japan, and Korea have a much lower number of ASNs, but the pool size is higher than that of countries like Brazil and Russia. Considering what we know about the disparity between the top 1,000 ASNs and the other 69,000, this can tell us quite a bit about the unique challenges that could be faced when encountering and creating a risk score for an IP coming out of that country.
For example, for countries with a lower number of ASNs but more IPs, malicious IPs are more likely to be in the same ASN as legitimate IPs, which means it may be more difficult to avoid false positives when blocking based on ASN alone. Using Akamai DNS data, we see all resolved IPs associated with Chinese ASNs come from 383 ASNs, as opposed to the equivalent for Russia, which is 2,311 ASNs. Therefore, there is a disparity of impact between blocking an ASN in China versus Russia. Blocking an entire ASN in China would take out an entire section of the internet, for instance, whereas blocking an entire ASN in Russia would take out significantly less in comparison. This is why we cannot entirely avoid solely IP-based risk scoring, and why we factored that in when we created the ASN risk scoring we discuss next.
ASN risk scoring
Our definition of ASN reputation is simple:
The reputation of an ASN measures the probability of any given active IP in that ASN being malicious.
To calculate this probability, we extend the Bayesian approach, as discussed earlier, by feeding the scores and weights of each IP upward to the ASN to which they belong. Akamai data is used to estimate the number of active IPs in the ASN. As we aggregate over many IPs for each ASN, we expect our ASN reputation calculations to benefit from the law of large numbers to compute the most accurate risk scores possible. For instance, if an ASN has two IPs that have a 30% chance of being malicious, the chance of neither IP being malicious is high, thus assessing the ASN itself as not malicious. This is in stark contrast to an ASN with 1,000 IPs all with 30% malicious likelihood. In this case, it would be deduced that 30% of all the IPs within that ASN are in fact malicious, making them significantly more dangerous than in the first example.
It is important to keep track of what we do not know and of the biases that exist in our data. There will be ASNs for which we have a lot of data, as well as ASNs for which there is little data. To attempt to combat this as best we can, we record a second number, evidence quantity, which reflects the volume of data we have and can also be used to calculate a distribution of probable ASN risk score values. In essence, there is a significant difference between an IP that we assume to be benign versus an IP that we know is benign (and, even further, one that we know is malicious). When we have less data we may assume a low risk score, but in this case we also record a low evidence quantity to contextualize the result. Both these numbers should be used when making decisions about the ASN.
We are always conscious that ground truth is hard to find in security research, but we are simultaneously confident that our ASN reputation scoring can be impactful.
ASN risk score landscape
Figure 3 shows the risk score of each ASN plotted against the log of pool size. We see that as the risk score gets worse, the pool size of the ASN is less likely to be large. To make this information easier to digest we have split ASN reputation into four groups:
- Benign: extremely low chance of malicious activity within the ASN
- Likely benign: predominantly benign; relatively low chance of malicious activity within the ASN
- Potentially malicious: should exercise caution; mostly benign, with some malicious activity detected
- Likely malicious: should exercise caution or avoid; the ratio of malicious activity to benign indicates the ASN is predominantly malicious or lacks sufficient control for malicious activity
The top 10 ASNs in terms of IPv4 address space (see Table 1) all reside in the bottom right corner of the plot above. This means they mostly have good reputations with a few exceptions. We have highlighted ISPs, mobile network operators (MNOs), and hosting service providers in the plot that are outliers. These have higher risk scores than similarly large ASNs. The data shows us that this is driven by a variety of threat types. It appears that there is something underlying these ASNs that makes them riskier. We must also remember that the relative volume of malicious activity is small with respect to these very large ASNs — there are smaller ASNs that are much riskier.
The Akamai ASNs called out in Figure 3 all have a “benign” risk score. But what would make one CDN more risky than another? This could be attributed to a lower barrier to entry for attackers, such as less effective monitoring or fewer controls.
We have also highlighted two groups totalling 13 ISP/MNO ASNs in Figure 3. These stand out on the plot as they are detached from similarly sized ASNs; that is, their risk scores are significantly higher. When we look a little deeper, we mostly see a mix of bot and scanning activity from these ASNs. We could speculate that these ASNs are more susceptible to malware infections.
As we move further to the upper left of the plot, we come to the riskier part of the long ASN tail with a variety of threat types. Generally, we see IP addresses that are being used to host malicious activities such as phishing websites, malicious files, or scanners. In some cases, we were able to see TOR network exit nodes, which can be a proxy for malicious activities.
Risky ASNs in online traffic
So far, we have looked at ASN risk scores in terms of IP space, but how often are risky ASNs seen in terms of online traffic? Figure 4 shows a subset of Akamai DNS traffic for a single day containing roughly 60 billion queries. A total of 76.6% of queries come from ASNs in the “benign” and “likely benign” categories and 18.1% come from the “potentially malicious” category, further emphasizing that legitimate and illegitimate services can operate from the same ASN. A total of 5.3% of these DNS queries resolve to an IP from an ASN with a risk score in the “likely malicious” category, showing the potential of ASN-based blocking in highly locked-down use cases in which the focus is on identifying true positives rather than avoiding false positives.
Taking action
Actioning IP intelligence is all about understanding risk. In some scenarios, precision is the biggest concern — we want to minimize the number of blocked legitimate services. In other cases — for example, in highly controlled environments — recall is most important, and we want to protect against all threats, even if that means some false positives.
Prior assumptions
Without ASN reputation, our assumption before looking at evidence would be that all IPs are legitimate. ASN reputation unlocks the ability to take a hierarchical approach to these prior assumptions. If the ASN of the IP is very bad, this can be the starting point for our intelligence and we can update our data as we collect more evidence. The score of other IPs in the ASN will affect the IP’s final score and could influence whether it is blocked.
Blocking entire ASNs
In highly locked-down environments it may be desirable to block entire ASNs based on their reputation. Since an ASN’s reputation is constantly evaluated, this list does not need to be static. New ASNs will be automatically added as their reputation deteriorates. Similarly, blocking can be relaxed as we see evidence that an ASN is less of a threat.
Conclusion
There will always be an element of nuance when discussing risk scoring, but ASN risk scoring has the potential to revolutionize the industry’s approach. Just like every other aspect of security, there isn't a silver bullet, and each organization has to make decisions that best suit their environment. But the ability to go a step beyond the solely IP-based risk scoring methodology allows for a more proactive defense model in your allowlisting and blocklisting. By utilizing an “guilt by association” model with an IP that is part of an ASN with a high likelihood of being malicious, you could preemptively block a threat even if it’s unknown to major IP intelligence feeds. This gives our customers another leg up in their defense strategy. And that’s why we’re here — to help secure life online.
The team at Akamai will continue to monitor and tweak this scoring as we garner more and more intel about this topic, and we’ll update you in future blog posts. To be sure you don’t miss any of the updates, as well as any other new security research, be sure to follow us on Twitter at @Akamai_Research.