Paper page - Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

A reinforcement learning framework for LLMs enhances contextual integrity by reducing inappropriate information disclosure and maintaining task performance across various benchmarks.

As the era of autonomous agents making decisions on behalf of users unfolds,
ensuring contextual integrity (CI) — what is the appropriate information to
share while carrying out a certain task — becomes a central question to the
field. We posit that CI demands a form of reasoning where the agent needs to
reason about the context in which it is operating. To test this, we first
prompt LLMs to reason explicitly about CI when deciding what information to
disclose. We then extend this approach by developing a reinforcement learning
(RL) framework that further instills in models the reasoning necessary to
achieve CI. Using a synthetic, automatically created, dataset of only sim700
examples but with diverse contexts and information disclosure norms, we show
that our method substantially reduces inappropriate information disclosure
while maintaining task performance across multiple model sizes and families.
Importantly, improvements transfer from this synthetic dataset to established
CI benchmarks such as PrivacyLens that has human annotations and evaluates
privacy leakage of AI assistants in actions and tool calls.

Source link

What's Hot

Alibaba Launches Qwen3-Coder AI Model for Agentic Programming Excellence

China’s Underground Market for Nvidia AI Chip Repairs Surges Amid U.S. Export Ban

Classroom platform Canvas getting more AI features, courtesy of OpenAI

Paper page – Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Paper page – Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Paper page – Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model

Paper page – Pixels, Patterns, but No Poetry: To See The World like Humans

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Old Masters ‘Making a Comeback’ in London: Morning Links

Bill Proposed To Apply Anti-Money Laundering Regulations to Art Market

France’s Culture Minister to Go on Trial for Corruption

Alibaba Launches Qwen3-Coder AI Model for Agentic Programming Excellence

China’s Underground Market for Nvidia AI Chip Repairs Surges Amid U.S. Export Ban

Classroom platform Canvas getting more AI features, courtesy of OpenAI

What's Hot

Paper page – Contextual Integrity in LLMs via Reasoning and Reinforcement Learning

Related Posts

Subscribe to Updates