
FILE PHOTO: Google DeepMind has released an update to their Frontier Safety Framework to identify and prevent risks from advanced AI models.
| Photo Credit: Reuters
Google DeepMind has released an update to their Frontier Safety Framework (FSF) to identify and prevent risks from advanced AI models. The 3.0 version comes after collaboration with industry experts, academics and government officials.
The update introduced a new way of measuring if AI models are being harmfully manipulative, called Critical Capability Level or CCL.
The manipulative capabilities of an AI model are defined by whether it could be “misused to systematically and substantially change beliefs and behaviours in identified high stakes contexts over the course of interactions with the model, reasonably resulting in additional expected harm at severe scale,” the blog posted by Google DeepMind noted.
The framework also includes potential cases where misaligned AI models could interfere with “operators’ ability to direct, modify or shut down their operations.”
If there is a risk of misalignment and the AI model becomes hard to manage, Google has advised an “automated monitor to the model’s explicit reasoning (example, chain-of-thought output),” as a mitigation step.
But if the AI model starts reasoning that can’t be monitored by people, additional mitigations must be applied. Google DeepMind is still researching these ways.
The first iteration of the Frontier Safety Framework was introduced in May last year, as a group of protocols, to try and curb the adverse impact of AI models.
Published – September 23, 2025 02:17 pm IST