Reddit has sued AI startup Anthropic for allegedly scraping and using the platform’s user-generated content without permission to train its Claude chatbot. The lawsuit, filed in San Francisco Superior Court on June 4, accuses Anthropic of breaching Reddit’s terms of service, undermining user privacy and control over their data, and misappropriating platform data for commercial gain.
The complaint claims Anthropic accessed Reddit’s servers over 100,000 times and included Reddit posts among the “good samples” used to fine-tune Claude. Reddit says the company used that data to build and monetise Claude while publicly claiming to avoid the use of “stolen” or unauthorised datasets.
“Reddit brings this action to stop Anthropic, who tells the world that it does not intend to train its models with stolen data, from doing just that,” the complaint reads.
The company is seeking an injunction to stop further use of its data, deletion of all Reddit-derived training material, financial restitution, and punitive damages.
Key Claims Against Anthropic
Reddit’s lawsuit outlines a range of allegations spanning technical access, legal violations, and business harms. These include:
Breach of Reddit’s User Agreement and Developer Terms
Reddit argues that Anthropic violated its platform rules by engaging in large-scale scraping for commercial purposes, which Reddit’s terms explicitly prohibit.
“Scraping for commercial purposes is strictly prohibited and violates Reddit’s Terms”, the complaint added.
The lawsuit further stated, “Anthropic violated both the User Agreement and the Developer Terms of Service.”
Reddit also states that it offered Anthropic a licensing agreement, which the company refused. Instead, Anthropic allegedly continued to extract Reddit content in violation of both the Application Programming Interface (API) and website terms.
Evidence of Unauthorised Use
According to the complaint, Anthropic:
Used Reddit data to fine-tune Claude’s language model
Reproduced deleted Reddit content in some Claude responses
Referenced specific Reddit communities in outputs, indicating prior exposure
Retained and exploited Reddit content without permission
Accessed or attempted to access Reddit over 100,000 times
Notably, Reddit alleges that Claude was trained on a whitelist of “high-quality” subreddits, including popular communities such as r/science, r/explainlikeimfive, r/AskHistorians, r/relationship_advice, r/programming, r/todayilearned, and r/Fitness, among others.
The company also notes that Claude admitted it may have been trained on posts that were later deleted, and acknowledged it has no way of knowing whether specific training data came from deleted or active sources, raising concerns about how long user content is retained after deletion.
Harm to Platform Operations and Business
Reddit states that Anthropic’s scraping activities interfered with its servers, diminished its ability to enforce content controls, and harmed its brand and licensing potential.
“Anthropic’s scraping interferes with Reddit’s servers and its relationship with users and partners”, the complaint added.
Anthropic’s Alleged Business Model Relied on Free Platform Content
Reddit claims that Anthropic’s business model is built on avoiding licensing deals while harvesting freely available content.
“Anthropic’s business model is built on harvesting free content from platforms like Reddit without compensating them”, the complaint read.
It also points to Amazon’s investment of approximately $8 billion in Anthropic since 2023 as evidence of the commercial value derived from the unauthorised use of Reddit content.
Legal Grounds Cited by Reddit
While the term misappropriation appears throughout the complaint, Reddit does not list it as a separate legal cause of action. Instead, it uses the term to frame Anthropic’s alleged exploitation of Reddit’s content.
Advertisements
Reddit Seeks Damages, Deletion of AI Training Data, and an Injunction
In its relief request, Reddit is asking the court to:
Prohibit Anthropic from continuing to use Reddit data
Compel Anthropic to delete all Reddit-derived content from Claude’s training
Award damages for harm to Reddit’s brand and business
Order restitution for Anthropic’s unjust enrichment
Impose punitive damages where allowed
Recover attorneys’ fees and legal costs
What’s Happening in India on AI Scraping and Data Ownership
India is clamping down on how companies collect and use personal data. The Digital Personal Data Protection Act, 2023 requires companies to obtain clear, affirmative consent before processing any digital information that can identify someone. The law doesn’t mention AI or scraping by name, but it still applies to those actions if they involve identifiable data. It also mandates that companies notify users, allow withdrawal of consent, and delete data when no longer necessary.
While the law doesn’t directly regulate AI training, it raises important questions for firms that scrape forums, comments, or platforms without permission. Additionally, the Ministry of Electronics and IT (MeitY) released a report in early 2025 on the development of AI governance guidelines. The report recommends that developers maintain an AI incident database, ensure traceability of data and systems, and raises questions about whether training data, particularly copyrighted content, should require licensing or other safeguards.
India’s legal system is also beginning to weigh in. At the Delhi High Court, news agency ANI has accused OpenAI of copyright infringement, alleging that ChatGPT reproduces its reports verbatim and fabricates quotes attributed to ANI. Representing OpenAI, Amit Sibal argued that facts, even when reported by journalists, are not copyrightable under Indian law. Only the specific language used to express them can be protected. He added that OpenAI has no servers or office in India but acknowledged that Indian courts can exercise jurisdiction if local entities are affected. The case is ongoing, with the court examining whether ChatGPT’s responses amount to infringement or fair use of public information.
The ANI-OpenAI dispute is a clear sign that platforms and regulators are starting to confront the question of who owns digital content and who gets to use it for AI.
Why This Matters
Reddit is taking steps to turn its user-generated content into a real business asset. In 2024, it licensed that data to Google in a deal reportedly worth $60 million to support AI development.
In that context, the lawsuit against Anthropic marks an effort to assert commercial rights and deter unauthorised scraping.
This shift is already reshaping how AI companies access training data, as platforms start gating APIs and pushing for licensing deals, increasing the cost of obtaining high-quality datasets.
The case also lands in the middle of a growing legal pushback. The New York Times has sued OpenAI, and Getty is in a courtroom fight with Stability AI. These cases are starting to define the legal lines around scraping, ownership, and how far AI companies can go.
Also read:
Support our journalism: