Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR

Save 15% on TechCrunch Disrupt 2025 Founder Passes (Sept. 29–Oct. 3 Only)

Empowering teams to unlock insights faster at OpenAI

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
AI Search

Google Ai Overviews Cloudflare robots.txt news publishers

By Advanced AI EditorSeptember 29, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Robots.txt got some much-needed TLC last week, courtesy of Cloudflare’s latest update. 

Cloudflare’s new Content Signals Policy effectively upgrades the decades-old honor system and adds a way for publishers to spell out how they do (and perhaps more importantly – how they don’t – want AI crawlers to use their content once it’s scraped.) 

For publishers, that distinction matters because it shifts the robots.txt file from a blunt yes-or-no tool into a way of distinguishing between search, AI training and AI outputs. And that distinction goes to the heart of how their content is used, valued and potentially monetized. 

It includes the option to signal that AI systems shouldn’t use their material for things like Google’s AI Overviews or inference. 

Several publishers Digiday has spoken to over the last several months have at one point or another described the current robots.txt as “unfit for purpose.” And while this upgrade still doesn’t ensure AI compliance, it does at least set a new precedent for better transparency and means publishers can spell out, in black and white, how they want AI crawlers to use their content – a move many publishers have welcomed as long overdue. 

And yet, none are blind to the glaringly obvious: without enforceability, the risk remains that AI platforms will still extract value from their work without compensation.

“The Policy separates out search, AI-train, and AI-crawl, which is a well-evolved understanding of how publishers should think about AI,” said Justin Wohl, vp of strategy for Aditude and former chief revenue officer for fact-checking site Snopes and TV Tropes. 

Cloudflare’s policy distinguishes between different ways AI systems use content: ‘search, where material might be pulled into something like an AI Overview with the potential for attribution or referral; ‘train,’ where content is ingested to build the model itself, often without compensation; and ‘crawl,’ where bots systematically scrape pages. For publishers, separating these use cases matters — because only one of them offers even the possibility of return, while the others risk extracting value without reward, noted Wohl. 

“The Content Signals Policy is an increasingly necessary solution in that when Google is creating its AI Overviews, the bots are somewhat indistinguishable from humans as they navigate sites, and are going to cause publishers’ IVT scores to explode, if the user agents haven’t been identifiable and the scoring impacts of them mitigated by the companies measuring such things for advertisers,” added Wohl. 

Five publishers Digiday spoke to for this article said the update to the robots.txt signals is a good start in letting publishers dictate how their data is used for search versus AI training. “That much-needed nuance is overdue and a genuinely positive step forward,” said Eric Hochberger, CEO and co-founder of Mediavine. “I’d love to see it go further to truly empower publishers to regain control over their content,” he added. 

That’s something other initiatives like the Responsible AI Licensing Standard (RSL), being developed by groups including Reddit, Fastly and news publishers, are working on. Whereas Cloudflare’s update is about giving publishers the ability to specify what they do allow their content to be used for by AI crawlers, RSL has created a standard for publishers to then set up AI remuneration – essentially royalties for whenever their content is scraped for retrieval augmentation generation (RAG.) 

Cloudflare will add the new policy language to robots.txt for customers that use it to manage their files, and is publishing tools for others who want to customize how crawlers use their content.

Progress, but still an elephant in the room 

For all the positives, neither RSL nor Cloudflare’s update addresses the elephant in the room: whether AI crawlers will actually honor these signals, especially the one publishers care about most – Google. 

Google technically separates its search crawler (Googlebot) and its AI crawler (Google-Extended), but in practice they overlap. Even if a publisher blocks Google-Extended, their content can still show up in AI Overviews, because those are tied to Google Search. In other words, AI Overviews are bundled with the core search crawler, not treated as a separate opt-in. That has meant most publishers haven’t been able to opt out of Google’s AI crawler for fear of their search traffic being affected. 

“I think it [content signals policy] is an interesting idea. But I don’t see any indication that Google and others will follow it,” said a senior exec at a large news organization, who spoke on condition of anonymity. “Google has been pretty clear they see AI summaries as fair use.”

Earlier this month, media group Penske became the biggest publisher to sue Google specifically for allegedly harming its traffic with AI Overviews and for alleged illegal content scraping. Meanwhile, the tech giant is currently working out remedies with the DOJ in court, to determine how it rectifies what has been deemed an illegal monopoly of its ad exchange and ad server. 

“Publishers all should commonly be in alignment that AI and Search crawlers should be distinguishable and treated differently,” said Wohl. “I do hope that Google, perhaps via the Chrome team, will see the sensibility in this from the perspective of how their browser works and impacts downstream parties,” he added.  

While publishers have welcomed Cloudflare’s update because of the added clarity, many acknowledge it’s just a stopgap: without guaranteed enforcement, the real risks from AI are still only partially addressed. But, it’s progress. 

It sets an important legal precedent, said Paul Bannister, CRO of Raptive. “It puts in parameters that a good actor should follow and if they don’t, you can take [legal] action. You may not win, but you can take action. You can, of course, ignore legal stuff, but if you do, you’re taking a real risk that there can be issues there. So much of this is laying the groundwork for how this is all going to look. It’s a small step forward, but it pushes the ball in the right direction.” 



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleThe Next Evolution of Performance Management
Next Article Bye Bye, Google AI removes AI from Google Search
Advanced AI Editor
  • Website

Related Posts

Brave updates its AI-powered search with a detailed answers feature

September 29, 2025

The Impact Of AI Overviews & How Publishers Need To Adapt

September 29, 2025

How AI search responds when you make a site more crawlable

September 29, 2025

Comments are closed.

Latest Posts

Federal Judge Denies Motion to Dismiss by Kasseem ‘Swizz Beatz’ Dean in 1MBD Scandal Case

MSN Warsaw Director Joanna Mytkowska on Museums in Times of Change

Nara Painting Heads to Christie’s London After Recent Sotheby’s Test

Fiat Family Faces New Allegations of Missing Artworks and Forgeries

Latest Posts

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR

September 30, 2025

Save 15% on TechCrunch Disrupt 2025 Founder Passes (Sept. 29–Oct. 3 Only)

September 30, 2025

Empowering teams to unlock insights faster at OpenAI

September 30, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR
  • Save 15% on TechCrunch Disrupt 2025 Founder Passes (Sept. 29–Oct. 3 Only)
  • Empowering teams to unlock insights faster at OpenAI
  • EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning – Takara TLDR
  • OpenAI Is Preparing to Launch a Social App for AI-Generated Videos

Recent Comments

  1. twistyneonemu3Nalay on Apple’s Lack Of New AI Features At WWDC Is ‘Startling,’ Expert Says – Apple (NASDAQ:AAPL)
  2. flamewhirlwindemu2Nalay on Apple’s Lack Of New AI Features At WWDC Is ‘Startling,’ Expert Says – Apple (NASDAQ:AAPL)
  3. twistyneonemu3Nalay on An improved Large-scale 3D Vision Dataset for Compositional Recognition
  4. Josephgotte on New MIT CSAIL study suggests that AI won’t steal as many jobs as expected
  5. twistyneonemu3Nalay on Reverse Engineering The IBM PC110, One PCB At A Time

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.