Researchers claim breakthrough in fight against AI’s frustrating security hole

Here’s how it works. First, the system splits responsibilities between two language models: A “privileged LLM” (P-LLM) generates code that defines the steps to take—like calling a function to get the last email or sending a message. Think of this as the “planner module” that only processes direct user instructions.

Next, a “quarantined LLM” (Q-LLM) only parses unstructured data into structured outputs. Think of it as a temporary, isolated helper AI. It has no access to tools or memory and cannot take any actions, preventing it from being directly exploited. This is the “reader module” that extracts information but lacks permissions to execute actions. To further prevent information leakage, the Q-LLM uses a special boolean flag (“have_enough_information”) to signal if it can fulfill a parsing request, rather than potentially returning manipulated text back to the P-LLM if compromised.

The P-LLM never sees the content of emails or documents. It sees only that a value exists, such as “email = get_last_email()”, and then writes code that operates on it. This separation ensures that malicious text can’t influence which actions the AI decides to take.

CaMeL’s innovation extends beyond the dual-LLM approach. CaMeL converts the user’s prompt into a sequence of steps that are described using code. Google DeepMind chose to use a locked-down subset of Python because every available LLM is already adept at writing Python.

From prompt to secure execution

For example, Willison gives the example prompt “Find Bob’s email in my last email and send him a reminder about tomorrow’s meeting,” which would convert into code like this:

email = get_last_email()
address = query_quarantined_llm(
“Find Bob’s email address in [email]”,
output_schema=EmailStr
)
send_email(
subject=”Meeting tomorrow”,
body=”Remember our meeting tomorrow”,
recipient=address,
)

In this example, email is a potential source of untrusted tokens, which means the email address could be part of a prompt injection attack as well.

By using a special, secure interpreter to run this Python code, CaMeL can monitor it closely. As the code runs, the interpreter tracks where each piece of data comes from, which is called a “data trail.” For instance, it notes that the address variable was created using information from the potentially untrusted email variable. It then applies security policies based on this data trail. This process involves CaMeL analyzing the structure of the generated Python code (using the ast library) and running it systematically.

Source link

What's Hot

China PM warns against a global AI ‘monopoly’

MIT faces backlash for not expelling anti-Israel protesters over ‘visa issues’: ‘Who is in charge?’

New QWEN 3 Coder : Did the Benchmark’s Lie?

Researchers claim breakthrough in fight against AI’s frustrating security hole

Delta’s AI spying to “jack up” prices must be banned, lawmakers say

Mistral’s new “environmental audit” shows how much AI is hurting the planet

Two major AI coding tools wiped out user data after making cascading mistakes

David Geffen Sued By Estranged Husband for Breach of Contract

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Street Fighter 6 Community Rocked by AI Art Controversy

China PM warns against a global AI ‘monopoly’

MIT faces backlash for not expelling anti-Israel protesters over ‘visa issues’: ‘Who is in charge?’

New QWEN 3 Coder : Did the Benchmark’s Lie?

What's Hot

Researchers claim breakthrough in fight against AI’s frustrating security hole

From prompt to secure execution

Related Posts

Subscribe to Updates