Anthropic’s Claude AI Models Can End “harmful” Conversations

FILE PHOTO: Anthropic has said that their Claude Opus 4 and 4.1 models will now have the ability to end conversations that are “extreme cases of persistently harmful or abusive user interactions.
| Photo Credit: Reuters

Anthropic has said that their Claude Opus 4 and 4.1 models will now have the ability to end conversations that are “extreme cases of persistently harmful or abusive user interactions.” The AI firm announced the move in a blog was to maintain welfare of the AI models which showed signs of distress when users insisted on continuing such conversations even when refused by Claude.

The models will end the chat only in rare “extreme edge cases,” Anthropic said, like when “requests for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror.”

The firm said that during pre-deployment testing of Claude Opus 4, the AI model self-reported and behavioural preferences showed they were in “apparent distress” when engaged in such conversations.

Claude has also been “directed not to use this ability in cases where users might be at imminent risk of harming themselves or others.”

Once the AI model ends the chat, the user will not be able to send new messages in the same chat. However, older chats will remain and the user will be able to start a new conversation immediately.

Users can still go back to the same chat and edit and retry previous messages so as to not lose important chats by creating new branches from the chat that was ended.

Anthropic said that the feature is still being tested and can change based on user feedback.

Published – August 19, 2025 01:16 pm IST

Source link

What's Hot

When Your Primary Customer Folds Overnight

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

Taylor Swift fans accuse singer of using AI in her Google scavenger hunt videos

Anthropic’s Claude AI models can end “harmful” conversations

Anthropic’s Claude AI can now automatically ‘remember’ past chats

Microsoft Adds Anthropic’s Claude AI to Copilot

Massimo deploys Claude AI to strengthen dealer, customer support

Tomb of Amenhotep III Reopens After Two-Decade Renovation

Morning Links for October 6, 2025

Sotheby’s to Sell René Magritte Held in Same Collection for 100 years

Former ARTnews Publisher Dies at 97

When Your Primary Customer Folds Overnight

Huawei's new open source technique shrinks LLMs to make them run on less powerful, less expensive hardware

Taylor Swift fans accuse singer of using AI in her Google scavenger hunt videos

What's Hot

Anthropic’s Claude AI models can end “harmful” conversations

Related Posts

Subscribe to Updates