Reddit Sues Anthropic For Scraping Content To Train Claude AI

Reddit has sued AI startup Anthropic for allegedly scraping and using the platform’s user-generated content without permission to train its Claude chatbot. The lawsuit, filed in San Francisco Superior Court on June 4, accuses Anthropic of breaching Reddit’s terms of service, undermining user privacy and control over their data, and misappropriating platform data for commercial gain.

The complaint claims Anthropic accessed Reddit’s servers over 100,000 times and included Reddit posts among the “good samples” used to fine-tune Claude. Reddit says the company used that data to build and monetise Claude while publicly claiming to avoid the use of “stolen” or unauthorised datasets.

“Reddit brings this action to stop Anthropic, who tells the world that it does not intend to train its models with stolen data, from doing just that,” the complaint reads.

The company is seeking an injunction to stop further use of its data, deletion of all Reddit-derived training material, financial restitution, and punitive damages.

Key Claims Against Anthropic

Reddit’s lawsuit outlines a range of allegations spanning technical access, legal violations, and business harms. These include:

Breach of Reddit’s User Agreement and Developer Terms

Reddit argues that Anthropic violated its platform rules by engaging in large-scale scraping for commercial purposes, which Reddit’s terms explicitly prohibit.

“Scraping for commercial purposes is strictly prohibited and violates Reddit’s Terms”, the complaint added.

The lawsuit further stated, “Anthropic violated both the User Agreement and the Developer Terms of Service.”

Reddit also states that it offered Anthropic a licensing agreement, which the company refused. Instead, Anthropic allegedly continued to extract Reddit content in violation of both the Application Programming Interface (API) and website terms.

Evidence of Unauthorised Use

According to the complaint, Anthropic:

Used Reddit data to fine-tune Claude’s language model

Reproduced deleted Reddit content in some Claude responses

Referenced specific Reddit communities in outputs, indicating prior exposure

Retained and exploited Reddit content without permission

Accessed or attempted to access Reddit over 100,000 times

Notably, Reddit alleges that Claude was trained on a whitelist of “high-quality” subreddits, including popular communities such as r/science, r/explainlikeimfive, r/AskHistorians, r/relationship_advice, r/programming, r/todayilearned, and r/Fitness, among others.

The company also notes that Claude admitted it may have been trained on posts that were later deleted, and acknowledged it has no way of knowing whether specific training data came from deleted or active sources, raising concerns about how long user content is retained after deletion.

Harm to Platform Operations and Business

Reddit states that Anthropic’s scraping activities interfered with its servers, diminished its ability to enforce content controls, and harmed its brand and licensing potential.

“Anthropic’s scraping interferes with Reddit’s servers and its relationship with users and partners”, the complaint added.

Anthropic’s Alleged Business Model Relied on Free Platform Content

Reddit claims that Anthropic’s business model is built on avoiding licensing deals while harvesting freely available content.

“Anthropic’s business model is built on harvesting free content from platforms like Reddit without compensating them”, the complaint read.

It also points to Amazon’s investment of approximately $8 billion in Anthropic since 2023 as evidence of the commercial value derived from the unauthorised use of Reddit content.

Legal Grounds Cited by Reddit

While the term misappropriation appears throughout the complaint, Reddit does not list it as a separate legal cause of action. Instead, it uses the term to frame Anthropic’s alleged exploitation of Reddit’s content.

Reddit Seeks Damages, Deletion of AI Training Data, and an Injunction

In its relief request, Reddit is asking the court to:

Prohibit Anthropic from continuing to use Reddit data

Compel Anthropic to delete all Reddit-derived content from Claude’s training

Award damages for harm to Reddit’s brand and business

Order restitution for Anthropic’s unjust enrichment

Impose punitive damages where allowed

Recover attorneys’ fees and legal costs

What’s Happening in India on AI Scraping and Data Ownership

India is clamping down on how companies collect and use personal data. The Digital Personal Data Protection Act, 2023 requires companies to obtain clear, affirmative consent before processing any digital information that can identify someone. The law doesn’t mention AI or scraping by name, but it still applies to those actions if they involve identifiable data. It also mandates that companies notify users, allow withdrawal of consent, and delete data when no longer necessary.

While the law doesn’t directly regulate AI training, it raises important questions for firms that scrape forums, comments, or platforms without permission. Additionally, the Ministry of Electronics and IT (MeitY) released a report in early 2025 on the development of AI governance guidelines. The report recommends that developers maintain an AI incident database, ensure traceability of data and systems, and raises questions about whether training data, particularly copyrighted content, should require licensing or other safeguards.

India’s legal system is also beginning to weigh in. At the Delhi High Court, news agency ANI has accused OpenAI of copyright infringement, alleging that ChatGPT reproduces its reports verbatim and fabricates quotes attributed to ANI. Representing OpenAI, Amit Sibal argued that facts, even when reported by journalists, are not copyrightable under Indian law. Only the specific language used to express them can be protected. He added that OpenAI has no servers or office in India but acknowledged that Indian courts can exercise jurisdiction if local entities are affected. The case is ongoing, with the court examining whether ChatGPT’s responses amount to infringement or fair use of public information.

The ANI-OpenAI dispute is a clear sign that platforms and regulators are starting to confront the question of who owns digital content and who gets to use it for AI.

Why This Matters

Reddit is taking steps to turn its user-generated content into a real business asset. In 2024, it licensed that data to Google in a deal reportedly worth $60 million to support AI development.

In that context, the lawsuit against Anthropic marks an effort to assert commercial rights and deter unauthorised scraping.

This shift is already reshaping how AI companies access training data, as platforms start gating APIs and pushing for licensing deals, increasing the cost of obtaining high-quality datasets.

The case also lands in the middle of a growing legal pushback. The New York Times has sued OpenAI, and Getty is in a courtroom fight with Stability AI. These cases are starting to define the legal lines around scraping, ownership, and how far AI companies can go.

For You

Source link

What's Hot

Key Priorities for Safe and Responsible Adoption

RewardDance: Reward Scaling in Visual Generation – Takara TLDR

Enhance video understanding with Amazon Bedrock Data Automation and open-set object detection

Reddit Sues Anthropic for Scraping Content to Train Claude AI

What is Claude AI and who funds it?

Anthropic Claude AI Experiences Outage, Developers Reflect on AI Tool Dependency and API Stability_the_again_model

Claude’s new AI file-creation feature ships with security risks built in

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

NeueHouse, a Hot Spot for Art Events, Files for Bankruptcy

National Gallery and Tate Have ‘Bad Blood’—and More Art News

Christie’s Will Auction The First Calculating Machine In History