DeepMind Warns Of AIs That May Resist Shutdowns

Artificial Intelligence & Machine Learning
,
Next-Generation Technologies & Secure Development

‘More Research’ Needed, Says Google

Rashmi Ramesh (rashmiramesh_) •
September 24, 2025

DeepMind Warns of AIs That May Resist Shutdowns — Image: Shutterstock

Google DeepMind expanded its risk framework to cover scenarios where artificial intelligence models might manipulate people or resist shutdown, marking the company’s most explicit warning yet about potential misalignment.

See Also: OnDemand | Navigate the threat of AI-powered cyberattacks

An update to Google’s Frontier Safety Framework adds a new misuse class for generative AI it calls “harmful manipulation,” reflecting concerns that models could elude human control.

“Models with high manipulative capabilities” could be “misused in ways that could reasonably result in large-scale harm,” the framework now states.

The framework is organized around capability thresholds at which models could cause severe harm unless appropriate mitigations are in place. Version 3.0 adds new levels and fleshes out mitigation approaches and acknowledges gaps where researchers do not yet have fixes. Among those gaps is the prospect that a model might resist operator attempts to modify it or shut it down, a scenario the company treats as an escalated misalignment risk.

DeepMind researchers frame many of these scenarios as a malfunction rather than the model acquiring human-style intentions.

A related concern is the emergence of what DeepMind labels misalignment risk, a risk category intended for situations where models develop a baseline instrumental reasoning ability capable of undermining human control.

As a mitigation, the company recommends examining “scratchpad” outputs, which are explicit, inspectable chains of thought. Looking at interim outputs isn’t a permanent solution, Google acknowledged. Future models might simulate scratchpad outputs that don’t actually reflect a verifiable chain of thought. “Additional mitigations may be warranted – the development of which is an area of active research,” Google said.

DeepMind also details second-order risks, such as an advanced model being used to accelerate machine learning research and in doing so enabling creation of still-more-capable systems. This risk could have a “significant effect on society’s ability to adapt to and govern powerful AI models.”

The update also calls out the possibility that a model could be tuned to “systematically and substantially change people’s beliefs.” Google classified that as a “low-velocity” threat, one they expect social defenses to largely handle. It nevertheless added “harmful manipulation” as a formal misuse risk to ensure it is measured and mitigated.

“Going forward, we’ll continue to invest in this domain to better understand and measure the risks associated with harmful manipulation,” DeepMind executives Four Flynn, Helen King and Anca Dragan wrote.

Source link

What's Hot

MAPO: Mixed Advantage Policy Optimization – Takara TLDR

Apple develops a lightweight AI for protein folding prediction

Kevin Rose on Digg, reinvention, and startup investing

DeepMind Warns of AIs That May Resist Shutdowns

Apple develops a lightweight AI for protein folding prediction

Google DeepMind Upgrades Frontier AI Safety Framework to Prevent Manipulation and Shutdown Risks_the_resist_risks

Google DeepMind updates Frontier Safety Framework for AI model risks

Art Dealer Mary Boone Says Prison Was ‘Very Relaxing’

New Research Supports Theory of Hidden Vermeer Self-Portrait

John Singer Sargent Paintings Expected to Bring In $12-15 Million

John Giorno’s Decades-Long Project Dial-A-Poem Is Now Online

MAPO: Mixed Advantage Policy Optimization – Takara TLDR

Apple develops a lightweight AI for protein folding prediction

Kevin Rose on Digg, reinvention, and startup investing

What's Hot

DeepMind Warns of AIs That May Resist Shutdowns

Related Posts

Subscribe to Updates