Need cloud computing? Get started now

AkaRank: Improving Popularity Rankings for Better Threat Intelligence, Part 2

Akamai Wave Blue

Written by

Noy Aizenberg and Congcong Xing

May 02, 2023

Noy Aizenberg

Written by

Noy Aizenberg

Noy Aizenberg is a Software Engineer at Akamai Technologies.

Congcong Xing

Written by

Congcong Xing

Congcong Xing is a Data Analyst at Akamai Technologies.

AkaRank is very stable — in fact, more stable than the other popularity lists — and AkaRank has good diversity and distribution with moderate geolocation biases.

Introduction

Part 1 of this series on improving popularity rank lists examined the shutdown of the widely used Alexa Top 1 Million list of popular websites. As part of their work, Akamai security researchers used the Alexa list to make better decisions to eliminate false positives from blocklists. Our threat intelligence is widely used to protect enterprises and subscribers in ISP and MNO networks around the world, and false positives disrupt the subscriber experience. 

The blog post also talked about the limitations of other popularity rank lists that have been introduced over the past few years. In addition, we introduced our new popularity ranking product, AkaRank, which was developed by a team at Akamai. AkaRank is our solution to fill the void left by Alexa. 

In this post, we will compare AkaRank with several alternatives using a variety of known analytics methods, and discuss why we think AkaRank improves the quality of our threat research. 

Comparative products used in this analysis

For this analysis, we compare Tranco and Cisco Umbrella popularity lists with AkaRank. Tranco is a popularity list that originally combined Alexa, Umbrella, and Majestic lists. Tranco integrated the Farsight ranking list once Alexa stopped publishing new updates on May 1, 2022. We used the Umbrella list as a comparison because it contains entries based on analysis of DNS traffic. Similarly, AkaRank is based on anonymized DNS traffic.

Stability over time

The research paper Evaluating the Long-term Effects of Parameters on the Characteristics of the Tranco Top Sites Ranking by the imec-DistriNet Research Group concluded that the Tranco integrated list had stability advantages over Alexa.  

The left graph in Figure 1 shows Tranco is the most stable of the four popularity lists evaluated. The right graph in Figure 1 shows that Tranco was also the most stable for the top 1000/10K/100K/1M subsets of each list, with average percent daily change below 1%, though the top 100 and top 10K had notable volatility during mid-December. 

As can be seen in the figure, and as noted in our previous post, there is a weekly variance due to the weekend effect.

Percent difference in rankings over two consecutive days. Fig. 1: (Left) The stability of four popularity rank list products measured as the percent difference in rankings over two consecutive days. (Right) The same measurement for different subsets of the popularity rank list: top 10, top 100, etc.
Daily change of AkaRank Fig. 2: The stability of AkaRank

Figure 2 compares AkaRank with the other lists using the same measurement techniques from the imec-DistriNet research paper. It shows the Tranco and AkaRank lists have similar stability behaviors over time. The most stable part of the AkaRank list is the top 1000 with 0.27% daily change on average. 

Changes in the top 10 and top 1000 (these domains would never make it onto threat lists) are not as important as changes in the top 10K or top 100K wherein relevant data for security research tends to be found. For AkaRank, in the one-month check there was 1.19% change among the top 100K, which is considered the most important part of popularity products. 

The maximum daily change was 4.41% among the top 1M, although according to the 2019 paper Clustering and the Weekend Effect: Recommendations for the Use of Top Domain Lists in Security Research, “Ranks below 100 k are not statistically meaningful because the data collected about those domains is too scarce.”

Understanding the distribution of TLDs

TLD is an abbreviation of top-level domain, the last segment of the domain name.

There are many types of TLDs, so let’s focus on the relevant ones only.

  • gTLDs — generic top-level domains (for example, [.com], [.net], [.org])

  • ccTLDs — country code top-level domains (for example, .us [United States, .il [Israel])

  • other TLDs — such as sponsored top-level domains (sTLDs), infrastructure top-level domains (ARPAs), and test top-level domains (tTLDs)

  • invalid TLDs — TLDs that are not in [iana tld list] (for example, .ip/.col from the Umbrella list)

According to Figure 3, AkaRank list is less diverse in generic TLDs than Tranco but both have a similar amount of distinct ccTLDs. Those ccTLDs represent 26.9% of AkaRank and 20.2% of Tranco. 

According to the most recent Verisign Domain Name Industry Brief from Q4 2022, ccTLD domain name registrations totaled 132.4 million (~38% of total registrations). New gTLD registrations totaled 27.3 million (~8%). Registrations for .com and .org dominated with 174.2 million (~50%). The rest are legacy gTLDs and other TLDs as described above. 

Unique TLDs by type over three popularity products Fig. 3: Unique TLDs by type over three popularity products (single-day snapshot)

As we can see in Figure 4, domains with generic TLDs make 33.2% to 71.1% of the lists, and domains with ccTLDs make 17% to 26.9% of the lists, which is less than the representation of ccTLDs in the total domain industry. Approximately half of the Umbrella list are invalid TLDs that are not in the [iana TLD list]. For example, [.ip]/[.col].

This change can suggest the commonsense understanding that most of the popular domains are worldwide and will have gTLD and not ccTLD. Some ccTLDs from large countries are popular, too; for example,  [microsoft.us] ranked 963 by AkaRank on February 15, 2023.

Percentage of domains Fig. 4: Percentage of domains by TLD (from single-day snapshot)

According to Figures 5 and 6, and the data, the most represented TLDs are [.com], [.org], and [.ru], representing 58% to 76.5% percent of the domains in the popularity lists. In the Tranco list, [.com], [.org], [.net] make up 58% of the domains but the third largest TLD is [.ru]. In Cisco Umbrella, only 23.5% of the 1M list contains TLDs that are not [.com], [.org], [.net];  in Tranco this percentage is 42% and in AkaRank this percentage is 28.6%. 

As we can see from Figure 6, Tranco has a large amount of [.ru] domains and more domains than [.org] domains, suggesting that Tranco data sources had a bias toward Russian domains at the time this analysis was done.

Distribtution functions of TLD Fig. 5: Cumulative distribution functions of TLD usage across the lists (from single-day snapshot)
Top 10 TLDs Fig. 6: Top 10 TLDs across the three popularity lists

Explaining geolocation biases 

Theoretically, a good popularity list or domain ranking should fully reflect the popularity of all websites without any biases. Due to some inevitable factors and limitations, there are always biases such as algorithm/model biases, data accessibility, internet popularity among different countries/areas, political restrictions, and so forth.

Geolocation biases are represented by the distribution of ccTLDs

As we can see in Figure 7, all three lists (AkaRank, Tranco, and Umbrella) have geolocation biases, but they are different in geolocation distributions. AkaRank has moderate biases in countries/areas with more domains in places like Western Europe and North America. Tranco has distinct biases with more domains in Russia and Germany. As AkaRank takes geolocation into account, it’s good to see that the results among ccTLDs are balanced.

Top ccTLDs distribution Fig. 7: Top ccTLDs distribution among the three popularity lists

Intersections among AkaRank,  Umbrella, and Tranco

Figure 8 shows there are 163,826 (~16%) domains common to all three lists. AkaRank and Umbrella have 18.93% in common, and AkaRank and Tranco have 37.48% in common. A total of 599,722 (~60%) domains showed up only in the AkaRank list. Among those, the highest AkaRank rank is 61 ([cloudapp.net]). 

Intersection among the lists of rankings Fig. 8: Venn diagram of Umbrella, Tranco, and AkaRank

Domains not covered by Umbrella and Tranco

In Figure 9, the Venn diagrams of the top 10K and top 10K to 100K among AkaRank, Umbrella, and Tranco show that AkaRank has domains that Umbrella and Tranco don’t have, especially for top 100K of AkaRank, which, as discussed earlier, are the most stable and important parts for security research. 

The AkaRank 10K, which is the most influenceable part of the list, has an overlap of 89% with Umbrella and close to 83% with Tranco. The total “missing” domains are 1101 versus Umbrella list and 1728 versus Tranco list.

The AkaRank overlap with Umbrella is close to 60% and the overlap with Tranco is close to 67%. 

10K AkaRank & 10k-100K AkaRank to 1M Umbrella
10K AkaRank & 10k-100K AkaRank to 1M Tranco Figure 9: Venn diagrams of the top 10K and top 10K to 100K of AkaRank compared with Umbrella and Tranco

Examples of domains that AkaRank covered but Umbrella did not cover are shown in Table 2.Table 3 shows examples of the domains covered by AkaRank but not Tranco. Some of the unique domains are well-used services and well-known APIs. 

googleapis[.]com

Allows communication with Google Services and their integration with other services 


gstatic[.]com

Has a special role in helping the content on Google load faster from their content delivery network

Table 1: Examples of domains that are in the AkaRank top 100K but not in Umbrella

gallagher[.]cloud

Gallagher's cloud-based security solution

gov[.]nf

Norfolk Island’s government website

Table 2: Examples of domains that are in the AkaRank top 100K but not in Tranco

Popularity lists, especially Top 100K sublists, are very important to support allowlisting processes. We saw the domains in Tables 1 and 2 are widely used services and APIs, as well as official government websites that are supposed to be allowlisted as we would never want to block them. Blocking those domains would be a significant critical service disruption for our customers.   

Domains not covered by AkaRank

Figure 10 compares the domains that Umbrella and Tranco have that AkaRank does not have, especially for top 100K, which, as discussed earlier, are the most stable and important parts for security research. 

For Umbrella 10K and 10K to 100K, the overlap with AkaRank is close to 21% and 19%, respectively. For Tranco 10K and 10K to 100K, the overlap with AkaRank is close to 96% and 72%, respectively.

10K AkaRank & 10k-100K Umbrella to 1M Aka Rank
10K Tranco & 10K-100K Tranco to 1M AkaRank Fig. 10: Venn diagrams of top 10K and top 10K to 100K of Umbrella/ and Tranco compared with AkaRank

Examples of the domains that Umbrella covered but AkaRank did not cover are shown in Table 3. Table 4 shows examples of the domains covered by Tranco but not AkaRank.

www[.]google[.]com

Both Umbrella and AkaRank have [google.com], but Umbrella has [www.google.com] as well.

sc[.]zoom[.]us

Both Umbrella and AkaRank have [zoom.us], but Umbrella has [sc.zoom.us] as well.

Table 3: Examples of domains that are in the Umbrella top 100K but not in AkaRank

ohthree[.]com

CSC Corporate Domains, Inc. (ranked 300 by Tranco)

boutell[.]co[.]uk

Website of small loan agency only in the United Kingdom. (ranked 3227 by Tranco globally)

Table 4: Examples of domains that are in Tranco top 100K but not in AkaRank

Conclusion

AkaRank was compared with other competitor’s popularity list products, and assessed in four different dimensions. 

  1. stability over time

  2. The distribution of TLDs

  3. geolocation bias

  4. the intersection of top domains 

AkaRank is very stable — in fact, more stable than the other popularity lists —and AkaRank has good diversity and distribution with moderate geolocation biases. It’s a robust popularity list used as an internal tool to protect Akamai’s customers from false positives.

For real-time security research updates, follow us on Twitter.



Akamai Wave Blue

Written by

Noy Aizenberg and Congcong Xing

May 02, 2023

Noy Aizenberg

Written by

Noy Aizenberg

Noy Aizenberg is a Software Engineer at Akamai Technologies.

Congcong Xing

Written by

Congcong Xing

Congcong Xing is a Data Analyst at Akamai Technologies.