Google DeepMind Proposes AI ‘Monitors’ To Police Hyperintelligent Models

Google DeepMind has introduced a new approach to securing frontier generative AI and released a paper on April 2. DeepMind focused on two of its four key risk areas: “misuse, misalignment, mistakes, and structural risks.”

DeepMind is looking beyond current frontier AI to artificial general intelligence (AGI), human-level smarts, which could revolutionize healthcare and other industries or trigger technological chaos. There is some skepticism over whether AGI of that magnitude will ever exist.

Asserting that human-like AGI is imminent and must be prepared for is a hype strategy as old as OpenAI, which started out with a similar mission statement in 2015. Although panic over hyperintelligent AI may not be warranted, research like DeepMind’s contributes to a broader, multipronged cybersecurity strategy for generative AI.

Preventing bad actors from misusing generative AI

Misuse and misalignment are the two risk factors that would arise on purpose: misuse involves a malicious human threat actor, while misalignment describes scenarios where the AI follows instructions in ways that make it an adversary. “Mistakes” (unintentional errors) and “structural risks” (problems arising, perhaps from conflicting incentives, with no single actor) complete the four-part framework.

To address misuse, DeepMind proposes the following strategies:

Locking down the model weights of advanced AI systems
Conducting threat modeling research to identify vulnerable areas
Creating a cybersecurity evaluation framework tailored to advanced AI
Exploring other, unspecified mitigations

DeepMind acknowledges that misuse occurs with today’s generative AI — from deepfakes to phishing scams. They also cite the spread of misinformation, manipulation of popular perceptions, and “unintended societal consequences” as present-day concerns that could scale up significantly if AGI becomes a reality.

SEE: OpenAI raised $40 billion at a $300 billion valuation this week, but some of the money is contingent on the organization going for-profit.

Preventing generative AI from taking unwanted actions on its own

Misalignment could occur when an AI conceals its true intent from users or bypasses security measures as part of a task. DeepMind suggests that “amplified oversight” — testing an AI’s output against its intended objective — might mitigate such risks. Still, implementing this is challenging. What types of example situations should an AI be trained on? DeepMind is still exploring that question.

One proposal involves deploying a “monitor,” another AI system trained to detect actions that don’t align with DeepMind’s goals. Given the complexity of generative AI, such a monitor would need precise training to distinguish acceptable actions and escalate questionable behavior for human review.

Source link

What's Hot

Who Are the Top 21 Artificial Intelligence (AI) Software Companies in 2025?

VC-Backed Lex Generalis Launches, Rejects Hourly Model – Artificial Lawyer

RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation – Takara TLDR

Google DeepMind Proposes AI ‘Monitors’ to Police Hyperintelligent Models

Tech Firms Concerned About Aug. 2 Deadline

AI Benchmark Discrepancy Reveals Gaps in Performance Claims

Huawei Readies Ascend 920 Chip to Replace Restricted NVIDIA H20

Hidden Portrait May Be Vermeer’s Earliest Known Work

Who Are the Art World Figures on the Time 100 List?

Acquavella Signs Harumi Klossowska de Rola, Daughter of Balthus

Heirs of Jewish Collector Urge Court to Reconsider Claim to Sunflowers