Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Baidu launches AI video generator, overhauls search features | Technology News

European bosses press Brussels to pause AI rules — Capital Brief

Freshfields Offers LL.M in LLMs For New Lawyers – Artificial Lawyer

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Amazon (Titan)
    • Anthropic (Claude 3)
    • Cohere (Command R)
    • Google DeepMind (Gemini)
    • IBM (Watsonx)
    • Inflection AI (Pi)
    • Meta (LLaMA)
    • OpenAI (GPT-4 / GPT-4o)
    • Reka AI
    • xAI (Grok)
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Facebook X (Twitter) Instagram
Advanced AI News
AI Search

understanding AI’s impact on content providers

Advanced AI EditorBy Advanced AI EditorJuly 3, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Content publishers welcomed crawlers and bots from search engines because they helped drive traffic to their sites. The crawlers would see what was published on the site and surface that material to users searching for it. Site owners could monetize their material because those users still needed to click through to the page to access anything beyond a short title.

Artificial Intelligence (AI) bots also crawl the content of a site, but with an entirely different delivery model. These Large Language Models (LLMs) do their best to read the web to train a system that can repackage that content for the user, without the user ever needing to visit the original publication.

The AI applications might still try to cite the content, but we’ve found that very few users actually click through relative to how often the AI bot scrapes a given website. We have discussed this challenge in smaller settings, and today we are excited to publish our findings as a new metric shown on the AI Insights page on Cloudflare Radar.

Visitors to Cloudflare Radar can now review how often a given AI model sends traffic to a site relative to how often it crawls that site. We are sharing this analysis with a broad audience so that site owners can have better information to help them make decisions about which AI bots to allow or block and so that users can understand how AI usage in aggregate impacts Internet traffic.

How does this measurement work?

As HTML pages are arguably the most valuable content for these crawlers, the ratios displayed are calculated by dividing the total number of requests from relevant user agents associated with a given search or AI platform where the response was of Content-type: text/html by the total number of requests for HTML content where the Referer header contained a hostname associated with a given search or AI platform.

The diagrams below illustrate two common crawling scenarios, and show that companies may use different user agents depending on the purpose of the crawler. The top one represents a simple transaction where the example AI platform is requesting content for the purposes of training an LLM, representing itself as AIBot. The bottom one represents a scenario where the example AI platform is requesting content to service a user request — looking for flight information, for example. In this case, it is representing itself as AIBot-User. Request traffic from both of these user agents would be aggregated under a single platform name for the purposes of our analysis. 

When a user clicks on a link on a website or application, the client will often send a Referer: header as part of the request to the target site. In the diagram below, the example AI platform has returned content that contains links to external sites in response to a user interaction. When the user clicks on a link, a request is made to the content provider that includes ai.example.com in the Referer: header, letting them know where that request traffic came from. Hostnames are associated with their respective platforms for the purpose of our analysis.

The new metric is presented as a simple table, comparing the number of aggregate HTML page requests from crawlers (user agents) associated with a given platform to the number of HTML page requests from clients referred by a hostname associated with a given platform. The calculated ratio is always normalized to a single referral request.

The table below shows that for the period June 19-26, 2025, as an example, the ratios range from Anthropic’s 70,900:1 down to Mistral’s 0.1:1. This means that Anthropic’s AI platform Claude made nearly 71,000 HTML page requests for every HTML page referral, while Mistral sent 10x as many referrals as crawl requests. (However, traffic referred by Claude’s native app does not include a Referer: header, and we believe that the same holds true for traffic generated from other native apps as well. As such, because the referral counts only include traffic from the Web-based tools from these providers, these calculations may overstate the respective ratios, but it is unclear by how much.)

Of course, due in part to changes in crawling patterns, these ratios will change over time. The table above also displays the ratio changes as compared to the previous period, with changes ranging from increases of over 6% for DuckDuckGo and Yandex to Google’s 19.4% decrease. The week-over-week drop in Google’s ratio is related to an observed drop in crawling traffic from GoogleBot starting on June 24, while Yandex’s week-over-week growth is related to an observed increase in YandexBot crawling activity that started on June 21, as seen in the graphs below.

Radar’s Data Explorer includes a time series view of how these ratios change over time, such as in the Baidu example below. The time series data is also available through an API endpoint.

Patterns in referral traffic

Changes and trends in the underlying activity can be seen in the associated Data Explorer view, as well as in the raw data available via API endpoints (timeseries, summary). Note that the shares of both referral and crawl traffic are relative to the sets of referrers and crawlers included in the graphs, and not Cloudflare traffic overall.

For example, in the referrer-centric view below, covering nearly the first four weeks of June 2025, we can see that referral traffic is dominated by search platform Google, with a fairly consistent diurnal pattern visible in the data. (The google.* entry covers referral traffic from the main google.com site, as well as local sites, such as google.es or google.com.tw.) Because of prefetching driven by the use of speculation rules, referral traffic coming from Google’s ASN (AS15169) is specifically excluded from analysis here, as it doesn’t represent active user consumption of content.

Clear diurnal patterns are also visible in the referral request shares of other search platforms, although the request shares are a fraction of what is seen from Google. 

Throughout June, the share of traffic referred by AI platforms was significantly lower, even in aggregate, than the share of traffic referred by search platforms.

Changes in crawling traffic

As noted above, the change in ratio values over time can be driven by shifts in crawling activity. These shifts are visible in the crawling traffic shares available in Data Explorer, as well as in the raw data available via API endpoints (timeseries, summary). In the crawler-centric view below, covering nearly the first four weeks of June 2025, we can see that the share of requests related to Google’s crawling activity for both their Googlebot and GoogleOther identifiers falls over the course of the month, with several peak/valley periods. A similar pattern observed in HTTP request traffic from Google’s AS15169 during that same time period loosely matches this observed drop in share.

In addition, it appears that OpenAI’s GPTBot saw multiple periods where little-to-no crawling activity was observed throughout the month.

What this means for content providers

These ratios directly impact the viability of content publication on the Internet. While they will vary over time, the trend continues to be more crawls and fewer referrals when compared in relation to each other. Legacy search index crawlers would scan your content a couple of times, or less, for each visitor sent. A site’s availability to crawlers made their revenue model more viable, not less.

The new data we are observing suggests that is no longer the case. These models continue to consume more content, more frequently, despite sending the same or less traffic to the source of its content.

We have released new tools over the last year to help site owners take control back. With a single click, publishers can block the kinds of AI crawlers that train against their content. And today, we announced new ways to make the exchange of value fair for both sides of the equation. However, we continue to recommend that content creators audit and then enforce their preferred policies for AI crawlers.

In addition to providing these new insights around crawling and referral traffic and associated trends, we’ve also taken the opportunity to launch expanded Verified Bots content. The Bots page on Cloudflare Radar includes a paginated list of Verified Bots, displaying the bot name, owner, category, and rank (based on request volume). This list has now been expanded into a standalone directory in a new Bots section. The directory, shown below, displays a card for each Verified Bot, showing the bot name, a description, the bot owner and category, and verification status. Users can search the directory by bot name, owner, or description, and can also filter by category (selecting just Monitoring & Analytics bots, for example).

Clicking on a bot name within a card brings up a bot-specific page that includes metadata about the bot, information on how the bot’s user agent is represented in HTTP request headers and how it should be specified in robots.txt directives, and a traffic graph that shows associated HTTP request volume trends for the selected time period (with a default comparison to the previous period). Associated data is also available via the API. As we add additional information to these bot-specific pages in the future, we will document the updates in Changelog entries.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleEuropean renewable stocks to watch as U.S. megabill hangs in the balance
Next Article Awesome Research For Everyone! – Two Minute Papers Channel Trailer
Advanced AI Editor
  • Website

Related Posts

Google AI announcements from June

July 2, 2025

AI helps find formula for paint to keep buildings cooler | Artificial intelligence (AI)

July 2, 2025

What is AI mode, and how to use it? Here’s everything you need to know

July 2, 2025
Leave A Reply Cancel Reply

Latest Posts

Khaled Sabsabi Reinstated as Australia’s Venice Biennale Artist

Peter Phillips, British Pop Art Originator, Dies at 86

Hundreds of Ancient Ceramics Found In Preserved Shipwreck in Turkey

Canaletto Auction Record Smashed at Christie’s London

Latest Posts

Baidu launches AI video generator, overhauls search features | Technology News

July 3, 2025

European bosses press Brussels to pause AI rules — Capital Brief

July 3, 2025

Freshfields Offers LL.M in LLMs For New Lawyers – Artificial Lawyer

July 3, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Baidu launches AI video generator, overhauls search features | Technology News
  • European bosses press Brussels to pause AI rules — Capital Brief
  • Freshfields Offers LL.M in LLMs For New Lawyers – Artificial Lawyer
  • Paper page – A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
  • Veolia and Mistral AI_ Join Forces to Revolutionize Resource Efficiency Management With Generative AI and Accelerate the Ecological Transformation

Recent Comments

No comments to show.

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.