
FILE PHOTO: Anthropic has said that their Claude Opus 4 and 4.1 models will now have the ability to end conversations that are “extreme cases of persistently harmful or abusive user interactions.
| Photo Credit: Reuters
Anthropic has said that their Claude Opus 4 and 4.1 models will now have the ability to end conversations that are “extreme cases of persistently harmful or abusive user interactions.” The AI firm announced the move in a blog was to maintain welfare of the AI models which showed signs of distress when users insisted on continuing such conversations even when refused by Claude.
The models will end the chat only in rare “extreme edge cases,” Anthropic said, like when “requests for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror.”
The firm said that during pre-deployment testing of Claude Opus 4, the AI model self-reported and behavioural preferences showed they were in “apparent distress” when engaged in such conversations.
Claude has also been “directed not to use this ability in cases where users might be at imminent risk of harming themselves or others.”
Once the AI model ends the chat, the user will not be able to send new messages in the same chat. However, older chats will remain and the user will be able to start a new conversation immediately.
Users can still go back to the same chat and edit and retry previous messages so as to not lose important chats by creating new branches from the chat that was ended.
Anthropic said that the feature is still being tested and can change based on user feedback.
Published – August 19, 2025 01:16 pm IST