Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions To Follow Real Instructions? - Takara TLDR

Large Language Models (LLMs) achieve strong performance on diverse tasks but
often exhibit cognitive inertia, struggling to follow instructions that
conflict with the standardized patterns learned during supervised fine-tuning
(SFT). To evaluate this limitation, we propose Inverse IFEval, a benchmark that
measures models Counter-intuitive Abilitytheir capacity to override
training-induced biases and comply with adversarial instructions. Inverse
IFEval introduces eight types of such challenges, including Question
Correction, Intentional Textual Flaws, Code without Comments, and
Counterfactual Answering. Using a human-in-the-loop pipeline, we construct a
dataset of 1012 high-quality Chinese and English questions across 23 domains,
evaluated under an optimized LLM-as-a-Judge framework. Experiments on existing
leading LLMs demonstrate the necessity of our proposed Inverse IFEval
benchmark. Our findings emphasize that future alignment efforts should not only
pursue fluency and factual correctness but also account for adaptability under
unconventional contexts. We hope that Inverse IFEval serves as both a
diagnostic tool and a foundation for developing methods that mitigate cognitive
inertia, reduce overfitting to narrow patterns, and ultimately enhance the
instruction-following reliability of LLMs in diverse and unpredictable
real-world scenarios.

Source link

What's Hot

Tesla board reveals reasoning for CEO Elon Musk’s new $1 trillion pay package

Warner Bros. sues Midjourney for AI images of Superman, Batman, and other characters

AI Integration in Recruitment | Recruiting News Network

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? – Takara TLDR

Few-step Flow for 3D Generation via Marginal-Data Transport Distillation – Takara TLDR

Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings – Takara TLDR

Durian: Dual Reference-guided Portrait Animation with Attribute Transfer – Takara TLDR

Morning Links for September 5, 2025

Fan Conventions Are Drawing The Line On AI ‘Slop’

Sculptor Who Defined Minimalism Dies at 88

Amy Sherald’s Canceled Smithsonian Show Goes to Baltimore

Tesla board reveals reasoning for CEO Elon Musk’s new $1 trillion pay package

Warner Bros. sues Midjourney for AI images of Superman, Batman, and other characters

AI Integration in Recruitment | Recruiting News Network

What's Hot

Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions? – Takara TLDR

Related Posts

Subscribe to Updates