App and API Security

David Sénécal works for Akamai as a Director of Engineering — Fraud and Abuse and is the author of The Reign of Botnets. He is passionate about improving internet safety and is an expert in online fraud and bot detection. He has over 25 years of experience working with web performance, security, and enterprise networking technologies across various roles, including support, integration, consulting, development, product management, architecture, and research.

Botnets can be complex and unpredictable. No matter your industry or background, it’s wise to protect your data from web scrapers — and drive better, more productive traffic in the process.

What is web scraping?

A ton of information and data is available online. Digital commerce and travel sites enable shoppers to browse their entire catalog 24/7; media sites offer thousands of articles on different topics; and social media platforms provide on-demand insight into individuals, their interests, and their opinions.

This represents a huge source of information that can be exploited and monetized. But before it can be exploited, it must be harvested — and there’s no better way to harvest information than by using a botnet. Botnets that collect information that’s freely available on the internet are called scrapers.

Web scraping is a complex problem. While some scraping activity is essential to a website's success, it can be challenging to determine the value of other activities — and the intention of the entity initiating the scraping.

In this three-part blog series, we’ll quantify the bot traffic on the internet, then discuss the different aspects of web scraping and how they affect various industries. Finally, we’ll review the characteristics of botnets and show you how to manage them effectively.

Three ways web scrapers affect websites

Scraping activity may target product detail pages, prices, or inventory (or all of these) and affect targeted websites in at least three ways.

Web scraping can sometimes be poorly calibrated and cause performance, stability, and availability issues for targeted websites. The perceived distributed denial of service (DDoS) can lead to revenue loss.
Scraping activity affects site metrics. One metric that website owners monitor closely is conversion rate, or the ratio of the total number of purchases to the total number of pages visited. Excess scraping traffic will significantly skew the data and reduce a company's visibility into the effectiveness of its marketing and product positioning strategies.
Scraping may also result in long-term revenue loss if the extracted intelligence is used to design a successful marketing campaign to hijack the audience from the targeted site.

How much internet traffic is driven by web scraping?

Of all use cases involving bots, web scraping represents the highest traffic volume by far. That’s why quantifying scraping activity helps us assess how much internet traffic comes from bots.

Depending on who you ask, the bot-to-overall traffic ratio will vary. In truth, nobody has precise numbers because nobody can monitor the entire internet and run bot detection on it. Even if they could, detection would not be 100% accurate despite the web security industry’s best efforts.

Bot activity comes and goes. Today’s web scraping traffic may not be the same as tomorrow’s. However, since Akamai carries up to 30% of internet traffic and is developing the next generation of scraper detection, we’re very well positioned to provide the most relevant statistics.

To look into the web scraping ratio question, we took a one-week traffic snapshot from a representative sample of customers who have been evaluating our new scraper detection technology over the last few months. This includes global fashion brands, major US retailers, international airlines, and hotel chains. The sample used for the study consists of 1.16 billion HTML pages and API requests. Requests for static content such as images, style sheets, JavaScript, videos, and fonts were ignored.

From the study, one fact became apparent: The level of bot activity a site may receive largely depends on the popularity of the brand’s products or services and their ranking in the market. Figure 1 shows an example of a sportswear brand catering to the US market. Only 17.7% of the traffic on their product pages comes from real humans, and more than 82.3% comes from bots. Of that 82.3%, only a small portion comes from good bots, such as web search engines, SEO, and social media. The rest comes from scrapers.

Fig. 1: Traffic distribution for a US sportswear company

In contrast, Figure 2 shows the traffic distribution for a retailer selling auto parts online with a mild bot problem. In this case, just over 15% of the traffic comes from bots, two-thirds of which comes from good bots.

Pie chart for traffic distribution that tells the percentage of human users, good bots, and scrapers.

Fig. 2: Traffic distribution for an auto part retailer in the United States

Figure 3 shows a home improvement company catering to the US market, where the distribution between scraper, human, and good bots is more evenly distributed. Yet, overall, bot activity accounts for 60% of the total traffic to the product pages.

Pie chart showing more evenly distributed percentages of scrapers, human users, and good bots.

Fig. 3: Traffic distribution for a US home improvement company

Looking at all the sampled customers, the average bot-to-human traffic ratio is about 70:30, as seen in Figure 4. The sample used for the study is not large enough to conclude that 70% of the internet consists of bot traffic, but we can safely conclude that the great majority of traffic on the internet originates from bots.

Pie chart showing percentages of website traffic distribution for human users, good bots, and scrapers.

Fig. 4: Average website traffic distribution

The impact of web scraping by industry

When a botnet scrapes a website, it’s harvesting publicly available information. The botnet operators regularly trigger scraping activity to collect their desired information, including product details, pricing, and inventory. Web scraping is a nuisance and violates most website’s acceptable use policies (AUPs). But is it illegal?

As with many legal issues, the answer is “it depends.” There are few recent legal regulations on this issue. Additionally, scraping activity may come from countries (or even US states) with different regulations from those in the country where the site is hosted, leading to difficulties in enforcing the AUP or in determining what legal regulation applies. Lastly, although the scraping activity can be detected, bot operators are masters of hiding their tracks, making it challenging to attribute the activity to any particular person or entity for prosecution.

Within web scraping, there are different use cases; depending on the context, not all have a negative impact. Ecommerce, travel, hospitality, and media are the most affected. Figure 5 shows the consequences of scraping based on industry and intent.

Use case chart of web scraping activity per industry.

Fig. 5: The outcome of web scraping activity by industry

All industries share a common need to serve their content to web search engines, SEO, or social media bots that scrape their sites to ensure content is visible and attracts customers. Site monitoring services also regularly pull content from the site to evaluate response times and availability across the globe.

Traffic coming from these so-called known bots, like Googlebot or Bingbot, must be accurately identified to prevent serving content to impersonators. In general, most website owners would like to avoid scraping activity that’s difficult to attribute to any specific entity.

Travel and hospitality

Travel aggregators and booking engines scrape travel and hospitality websites. These sites allow travelers to easily find flights, hotel rooms, car rentals, and even entertainment at their destination. Booking engines may sell their own products or services on top of the consumer’s airfare or accommodations.

To make the content readily available to the account aggregator, the airline or hotel may make the data accessible through APIs or allow the aggregator to scrape their site. There is typically a mutually beneficial agreement in place between the two parties.

However, some airlines or hotel brands may only want to be associated with their specific booking engine partners. Aggregators sometimes scrape these sites directly, advertise, and resell the airfares or hotel rooms without the brand’s consent, which violates the AUP. Depending on the website's architecture, scraping activity may lock the airfare or room inventory for a few minutes, making it unavailable to legitimate users.

Ecommerce

Ecommerce sites receive a significant amount of activity from bot operators that scrape a site in search of items for sale during an event — an activity commonly referred to as scalping. This activity is usually punctual and sporadic.

The web scraping activity behind most traffic usually comes from entities harvesting data to extract business intelligence. This can be initiated by well-funded companies that specialize in data extraction or individual companies that want to monitor their competition. In any case, a company generally uses the data to define its marketing and product strategy.

Digital media

Digital media websites want to attract as many users as possible who may be interested in the content they produce. News outlets generally welcome scraping activity from new aggregators or larger news organizations to potentially have their content referenced, which increases their visibility. More readers or viewers translate into more subscriptions, ad impressions, and revenue.

For premium media services like The Wall Street Journal, content may only be available to paid subscribers. Although it’s essential that web search engines or social media bots be able to index and reference the content behind a paywall, these good bots must be identified to ensure the full content is safe from scrapers that attempt to impersonate legitimate bots like Googlebot and Facebook.

Social media

Consumers have long used social media like X (formerly Twitter), Instagram, and Facebook to share their opinions. These platforms have become a huge source of information that marketers can use to infer user sentiment toward products, companies, and even social issues.

Companies that specialize in business intelligence and product marketing also scrape social media sites to collect and process data through complex machine learning algorithms, extracting valuable insights that they can use to optimize product strategy. Social media sites may sell that data or restrict scraping activity.

Social media and dating sites hold a trove of personal data about individual users, making user profile scraping a major privacy concern. While a company searching on LinkedIn for professionals to hire is entirely acceptable and in line with the site's purpose, it is unacceptable for a company or individual to scrape all profiles to build a personal information database that could later be sold for profit.

Shield yourself from malicious scraping activity

Botnets can be complex and unpredictable. No matter your industry or background, it’s wise to protect your data from web scrapers — and drive better, more productive traffic in the process.

Akamai Content Protector can help. As the next generation of scraper protection, it’s built to detect and mitigate most scraping activity, even as attack strategies evolve. It’s also been tested by major online brands, including those with major web scraping challenges.

In today’s fast-paced cyber landscape, the right tools are more important than ever. With Akamai, you get just that — along with the confidence you need to thrive over the long term.

Written by

David Sénécal

January 30, 2024

Written by

David Sénécal

How AI Bots Are Rewriting the Rules of Publishing

July 10, 2025

See how AI bots impact publishers and how Akamai helps you protect, control, and monetize your content as AI reshapes how people find information.

by Emily Lyons

The adoption of AI bots and AI chatbots is only accelerating, bringing with it both opportunities and risks.

Agentic AI Is Here — and It’s Shaping the Future of Bot Defense

July 02, 2025

Businesses must shift toward intent-based decision-making in real time to navigate the complexities of AI-driven user interactions in the era of agentic AI.

by Tsvika Klein

Our commitment to Europe is long-standing, and our investment in the EU continues to grow to meet evolving customer needs.

Commitment to Powering Europe’s Digital Sovereignty and Competitiveness

July 01, 2025

Akamai remains committed to supporting our customers’ European digital sovereignty with our suite of robust, secure, and high-performing solutions.

by Natalie Billingham

Security

App and API Security

Zero Trust Security

Bot & Abuse Protection

INFRASTRUCTURE SECURITY

Cloud Computing

Content Delivery

APPLICATION PERFORMANCE

MEDIA DELIVERY

EDGE APPLICATIONS

MONITORING, REPORTING, AND TESTING

CLOUD COMPUTING

SECURITY

CONTENT DELIVERY

Library

The Web Scraping Problem: Part 1

What is web scraping?

Three ways web scrapers affect websites

How much internet traffic is driven by web scraping?

The impact of web scraping by industry

Travel and hospitality

Ecommerce

Digital media

Social media

Shield yourself from malicious scraping activity

Related Blog Posts

How AI Bots Are Rewriting the Rules of Publishing

Agentic AI Is Here — and It’s Shaping the Future of Bot Defense

Commitment to Powering Europe’s Digital Sovereignty and Competitiveness

Products

company

Careers

newsroom

Legal & compliance

Glossary