Paper page - CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

CompassJudger-2, a generalist judge model, achieves superior performance across multiple benchmarks through task-driven data curation, verifiable rewards, and a refined learning objective with margin policy gradient loss.

Recently, the role of LLM-as-judge in evaluating large language models has
gained prominence. However, current judge models suffer from narrow
specialization and limited robustness, undermining their capacity for
comprehensive evaluations. In this work, we present CompassJudger-2, a novel
generalist judge model that overcomes these limitations via a task-driven,
multi-domain data curation strategy. Central to our approach is supervising
judgment tasks with verifiable rewards, guiding intrinsic critical reasoning
through rejection sampling to foster robust, generalizable judgment
capabilities. We introduce a refined learning objective with margin policy
gradient loss to enhance performance. Empirically, CompassJudger-2 achieves
superior results across multiple judge and reward benchmarks, and our 7B model
demonstrates competitive judgment accuracy with significantly larger models
like DeepSeek-V3 and Qwen3-235B-A22B. Additionally, we propose JudgerBenchV2, a
comprehensive benchmark evaluating cross-domain judgment accuracy and rank
consistency to standardize judge model evaluation. These contributions advance
robust, scalable LLM judgment and establish new performance and evaluation
standards.

Source link

What's Hot

IBM vs. Amazon: Which Cloud Infrastructure Stock Offers More Upside? – July 15, 2025

Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet

AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success

Paper page – CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Paper page – EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper page – LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Paper page – Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

The Artists and Art Pros Who Donated to Cuomo and Mamdani’s Campaigns

Phillips Sues Billionaire’s Son Over $14.5 M. Pollock Painting

Murujuga Rock Art in Australia Receives UNESCO World Heritage Status

‘Earth Room’ Caretaker Dies at 70

IBM vs. Amazon: Which Cloud Infrastructure Stock Offers More Upside? – July 15, 2025

Perplexity’s Comet is here, and after using it for 48 hours I’m convinced AI web browsers are the future of the internet

AWS doubles investment in AWS Generative AI Innovation Center, marking two years of customer success

What's Hot

Paper page – CompassJudger-2: Towards Generalist Judge Model via Verifiable Rewards

Related Posts

Subscribe to Updates