The Most Common Combosquatting Keyword Is “Support”
Executive summary
Cybersquatting (aka domain squatting and URL hijacking) is often used during phishing campaigns, identity theft, and malware install attempts.
Combosquatting outranks typosquatting in terms of both number of active domains and click-throughs, making it today’s biggest cybersquatting threat.
We analyzed global DNS traffic and internal malicious domain lists to identify commonly used combosquatting keywords.
We compiled a sample of the 50 most popular combosquatting keywords.
Introduction
Malicious actors are imitating brand websites on a daily basis. Their operation would be difficult to pull off in real life; for example, the necessary physical building could be tricky to obtain. Online though, these threat actors can host well-designed look-alike sites that include imitated domain names.
Often these websites are hosted on domain names that are a close approximation of the original brand’s domain. We refer to those as cybersquatted domains.
Our analysis of malicious domains has shown that combosquatting is the biggest cybersquatting threat. In this post, we present how we can detect combosquatting in DNS traffic, and list the most common combosquatting keywords attackers are using to deceive organizations and individuals alike.
What is cybersquatting?
Cybersquatted domains are domain names registered and used by malicious actors to profit from the goodwill of a brand or a name that they do not own. Cybersquatting is often seen as an enabler in campaigns that are trying to install ransomware (e.g., through malvertising), make phishing attempts, or steal someone’s identity.
Cybersquatting variants
There are multiple types of cybersquatting out there in the wild. Table 1 demonstrates the differences between them, using the fictitious brand safebank[.]com.
Variant |
Description |
safebank[.]com example |
---|---|---|
Combosquatting |
A keyword is added to the brand domain |
safebank-security[.]com |
Typosquatting |
Addition, removal, or replacement of a character |
safebqnk[.]com |
Bitsquatting |
Random ASCII bit flip |
sagebank[.]com |
IDN homograph |
Using similar-looking characters |
sǎfebank[.]com |
TLD squatting |
Replace the top-level domain (TLD) |
safebank[.]co |
Soundsquatting |
Uses homophones |
savebank[.]com |
Dotsquatting |
Insert one or more dots |
sa.febank[.]com |
Table 1: The variants of cybersquatting
From these variants, dotsquatting earns an extra mention. During our research of existing literature, we could not find any mention of it. However, we were seeing these cases frequently enough in our data that it warranted its own slot. Dotsquatting is the descriptive name for it among our team, so that’s what we’ve called it here.
How the variants interact
There is some overlap among the various squatting types, especially between bitsquatting and typosquatting. The bitsquatting example above is a testament to this: Since “g” is next to “f” on a QWERTY keyboard, it could be also considered typosquatting.
These types are also not mutually exclusive. Multiple squatting types can also be combined inside a single domain name, such as safebank-security[.]co, which can be considered both combosquatting and TLD squatting.
Finally, not all cybersquatting in the wild is covered by Table 1. And there will likely be even more variants in the future as these attack vectors tend to evolve.
Cybersquatting monetization
The financial damages and impacts of cybersquatting have existed for many years, yet it still remains an enormous threat for both organizations and individuals alike.
The term is also commonly used in more general contexts. Typical methods in this area are domain name warehousing and domain name frontrunning. As an example, someone purchases coke[.]net (a TLD squat) and tries to sell it for a hefty profit to the Coca-Cola Company. Another commonly found monetization type is affiliate marketing through hit stealing. For example, registering payypal[.]com and then redirecting the visitors to the genuine website through a referral code. Attempts are numerous and they often succeed. The successes in the past led to various laws and regulations being passed around, including the U.S. Anti-Cybersquatting Consumer Protection Act (ACPA).
Examples of successful cybersquatting
In 2023, Reddit became the victim of a highly targeted phishing campaign. The attack involved a website that cloned the behavior of its intranet gateway, hosted on a cybersquatted domain, the latter being mentioned implicitly in Reddit’s security incident response. The malicious actors gained access to limited information of employees and advertisers.
Facebook became a victim in 2011 through more than 100 look-alike domain names created from simple misspellings. Facebook was later awarded compensation of nearly $2.8 million in damages.
Consumers can also be directly targeted. In October 2022, Bleepingcomputer reported an extensive typosquatting campaign with the aim of getting people to install malware-infected apps. The victims became infected with keyloggers and malware that stole credentials from bank accounts and cryptocurrency wallets.
The popularity of combosquatting
In our 2022 analysis, combosquatting was the most commonly observed cybersquatting type in terms of unique domain names. In other words, malicious actors are using combosquatting as part of their attack vector much more often than the other types of cybersquatting.
Combosquatting also seemed to generate the most DNS queries, with each of those queries representing a potential victim visiting a malicious domain.
These two data points make combosquatting the biggest cybersquatting threat, according to our analysis.
Typosquatting is stealing the spotlight
This insight from our team is aligned with the findings from a large-scale study from 2017 that was specifically focused on combosquatting: “We find that combosquatting domains are 100 times more prevalent than typosquatting domains.”
Despite all of this, it seems to us that typosquatting — not combosquatting — is the variant getting most of the attention in research, blogs, and magazines. As researchers, we were not able to find any data to support this attention on typosquatting. The data we’ve seen makes us believe that malicious actors are more than happy that typosquatting has the spotlight so that combosquatting can continue to slip under the radar.
Keywords in combosquatting
Remember: Combosquatting is a cybersquatting variant in which a keyword is added to the brand’s domain. Here are a few more examples for safebank[.]com:
safebank-members[.]com
mysafebank[.]com
login-safebank[.]com
But how do we define keyword? Taking a look at the above examples, we can see keywords such as “members,” “my,” and “login” appended to the brand “safebank” through a hyphen (login-safebank) or simply concatenated (mysafebank) directly on either side.
The keywords are meant to invoke certain feelings, as shown in Table 2.
Keywords |
Feelings |
---|---|
Verification, account, login |
Safety, authority |
Now, alert |
Urgency |
Free, promo |
Fear of missing out |
Table 2: Examples of keywords and the feelings they evoke
Brand names do not equal legitimacy
The brand name (safebank) makes the link look legitimate. Of course, the fact that a link contains a brand name does not make it safe. Nothing is stopping anyone from registering domain names containing trademarked brand names. It’s clear that the attacker’s goal here is to trigger a fast, emotional response in the user, rather than a rational one. Combining the brand with a keyword seems like a sensible way to achieve this.
Now, how do we use this knowledge to protect people?
As a cybersecurity research team, we have the most powerful weapon: data! We have access to a huge list of domains that we have flagged as malicious. It is this list that we share with our customers, and it allows them to protect end users while they browse the internet. Moreover, we can also use DNS traffic data to see keywords trends in newly observed domains, which contain both benign and malicious domains.
Data-driven keyword discovery
It is important to note that the dataset we used for this analysis contains only malicious phishing domains. This allowed us to focus solely on what keywords attackers are actively using today. Every input has been flagged as phishing through multiple internal processes to ensure accuracy. In this section, we will walk you through a rough outline of how we collated the list of common keywords.
Starting the analysis
You may recall that sometimes keywords are appended to a brand by a hyphen, and sometimes they are concatenated directly.
The former keywords are easy to find — we just need to split a domain name by hyphens, as hyphens are natural delimiters. For example: Assume we have a domain amazon-e[.]com. The hyphen is a dead giveaway what the keywords are: the brand name amazon and keyword e. Very simple.
The directly concatenated keywords, however, introduce a much larger challenge. They require a variety of knowledge points such as language, localized brands, and even browsing behavior. Keyword overlaps are a common example of this.
Let’s take a look at amazone[.]com. Here, we have no idea if the brand is Ama (Italian leather sneakers manufacturer), Amaz (Greek women's clothing designer), or Amazon (American tech company). This domain provides us with the following options for keywords: zone, one, e, which all could be valid.
Filtering the domains
Now, we greatly simplify the task. We restrict the analysis to the first type of keywords that are easy to extract: the domain names containing hyphens. The distribution on this subset should follow the distribution on the entire dataset, because of the sheer volume of data.
In a nutshell, the initial process included four major steps:
Take the domain names from our phishing list as input
Strip the TLD
Filter the remaining list for common brand names
Split by the hyphen
Once these four steps are complete, we collect all the resulting words in a large list. We then refined this list further (by removing brand names), leaving only the keywords. Finally, we compile all the keywords and do a count.
The 50 most popular combosquatting keywords
Through the filtering process above, we are left with the most popular combosquatting keywords targeting popular brands. We know this to be true because the inputs themselves were all confirmed to be phishing domains in the past.
Table 3 lists the top 10 combosquatting keywords extracted through this process, by popularity rank. You can find the comprehensive top 50 list in our GitHub.
Rank |
Keyword |
---|---|
1 |
support |
2 |
com |
3 |
login |
4 |
help |
5 |
secure |
6 |
www |
7 |
account |
8 |
app |
9 |
verify |
10 |
service |
Table 3: The top 10 combosquatting keywords in order of popularity
This gives us a very clear view of what keywords attackers are leveraging to defraud victims. As you can see, the most commonly used combosquatting keyword is “support.” This is likely because legitimate support pages are often portals within a site, thus making the URL something like support[.]company-name[.]com.
Surprises
A surprising finding was “com.” It is a keyword that we had not expected to have such a high rank, and we would not have been aware of it without the outlined data-driven approach and our vast dataset. Some examples are accountpaypal-com[.]info and com-apple[.]co.
Another one is “jp,” the TLD for Japan. Some examples are apple1-jp[.]com and jp-rakuten[.]com. There are a few other keywords on this list that are TLD codes: “US”, “UK,” and “FR.” This can possibly point to the countries being targeted the most.
Conclusion
The potential applications of cybersquatting are innumerous, and the targeted victims range from individual consumers to large corporations. This makes it difficult to quantify the total damage that cybersquatting is causing. In addition, we believe cybersquatting campaigns are severely underreported; many only get press once a large entity is affected.
These large-scale incidents, and their smaller cousins, are lucrative — and it is critical that we perform analyses such as this one to increase our understanding of attacker behavior.
We will continue to monitor threats such as these for further analysis and share the findings with the community. To keep up to date with the latest security research, follow us on Twitter.