Paper page - CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

CompassJudger-2, a generalist judge model, achieves superior performance across multiple benchmarks through task-driven data curation, verifiable rewards, and a refined learning objective with margin policy gradient loss.

Recently, the role of LLM-as-judge in evaluating large language models has
gained prominence. However, current judge models suffer from narrow
specialization and limited robustness, undermining their capacity for
comprehensive evaluations. In this work, we present CompassJudger-2, a novel
generalist judge model that overcomes these limitations via a task-driven,
multi-domain data curation strategy. Central to our approach is supervising
judgment tasks with verifiable rewards, guiding intrinsic critical reasoning
through rejection sampling to foster robust, generalizable judgment
capabilities. We introduce a refined learning objective with margin policy
gradient loss to enhance performance. Empirically, CompassJudger-2 achieves
superior results across multiple judge and reward benchmarks, and our 7B model
demonstrates competitive judgment accuracy with significantly larger models
like DeepSeek-V3 and Qwen3-235B-A22B. Additionally, we propose JudgerBenchV2, a
comprehensive benchmark evaluating cross-domain judgment accuracy and rank
consistency to standardize judge model evaluation. These contributions advance
robust, scalable LLM judgment and establish new performance and evaluation
standards.

Source link

What's Hot

How Europe is racing to resolve its AI sovereignty woes

Nvidia to resume H20 AI chips sales in China after policy U-turn

Paper page – Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper page – CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Paper page – Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper page – MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Paper page – Test-Time Scaling with Reflective Generative Model

Murujuga Rock Art in Australia Receives UNESCO World Heritage Status

‘Earth Room’ Caretaker Dies at 70

Racquel Chevremont on Showing Black Excellence on And Just Like That…

Homeland Security Targets Chicago’s National Museum of Puerto Rican Arts & Culture

How Europe is racing to resolve its AI sovereignty woes

Nvidia to resume H20 AI chips sales in China after policy U-turn

Paper page – Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

What's Hot

Paper page – CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Related Posts

Subscribe to Updates