Several writers last year filed a lawsuit
against Anthropic alleging that its AI assistant Claude had infringed copyright. It trained its large language models on their works — pirated fiction and nonfiction books.
Shadow libraries, unauthorized online repositories, or databases, of books, academic articles, and other content were used.
Anthropic agreed to pay $1.5 billion to some 500,000 authors
whose work trained the models without permission. The request for a preliminary approval was filed Friday with a San Francisco federal judge, but this involuntary training or copyright theft is not
what caught my attention to the illegal use of content to train large language models (LLM) that continue to take consumer privacy.
Why would LLMs be trained on fiction books? Perhaps to
improve their ability to understand and generate higher-quality creative.
advertisement
advertisement
Nonfiction provides factual information, while fiction teaches models nuances of narrative and human experience.
Still, it is fiction, and the reactions of characters may not reflect the ways an actual human may react in a specific situation.
Executives must have known the court was getting close
to making a decision, because Anthropic recently changed its updated privacy terms that will take effect on September 28. Users see it when logging into Claude.ai. The first sign of this update came
in a blog post in late August.
The settlement is among the first in dozens of copyright lawsuits filed against AI companies like OpenAI, Media, and Midjourney, according to Bloomberg. All
alleged misuse of proprietary online content.
Anthropic wrote in a court filing that it felt “inordinate pressure” to cut a deal to avoid a potentially business-ending trial that
could have put the company on the hook for as much as $1 trillion in damages, according to the report.
Emails about the change in terms came last week to ensure users knew. Anthropic
previously did not use consumer chat data to train its model. Now it wants to train its AI systems on user conversations and coding sessions, and plans to extend data retention to five years for those
who do not opt out.
These changes only affect consumer accounts on Claude Free, Pro, and Max plans. For those who use Claude for Work, via the API, or other services under
its Commercial Terms or other Agreements, then these changes don’t apply.
Then today, Forbes reported Anthropic has become the third AI company, following OpenAI and Grok
whose user chatbot conversations have found their way into Google search results.
“Unlike OpenAI and xAI though, Anthropic said it blocked crawlers from Google, ostensibly preventing
those pages from being indexed,” Forbes reported. “But despite this, hundreds of Claude conversations still became accessible in search results (they have [since] been removed).”
Anthropic spokesman Gabby Curtis told Forbes that Claude conversations were only visible on Google and Bing because users had posted links to the conversations
online or on social media.
“We give people control over sharing their Claude conversations publicly, and in keeping with our privacy principles, we do not share chat directories
or sitemaps of shared chats with search engines like Google and actively block them from crawling our site,” Curtis told Forbes in an email.
Data retention also will
change. For those who agree to let Anthropic use its data to train models, the company will retain this data for five years. The company said users retain “complete” control over how it use the
data.
“If you change your training preference, delete individual chats, or delete your account, we’ll exclude your data from future model training,” the email said.
Anthropic
also will restrict service to entities that are more than 50% owned by companies headquartered in unsupported regions, such as China, regardless of where they are in the world.
A Chinese
media outlet reported Singapore-based Trae, an AI-powered code editor launched by ByteDance in China for overseas users, is known to use OpenAI’s GPT and Anthropic’s Claude models. A number of
users of Trae have raised the issue of refunds to Trae staff on developer platforms over concerns their access to Claude will end.