Reward modeling lies at the core of reinforcement learning from human
feedback (RLHF), yet most existing reward models rely on scalar or pairwise
judgments that fail to capture the multifaceted nature of human preferences.
Recent studies have explored rubrics-as-rewards (RaR) that uses structured
natural language criteria that capture multiple dimensions of response quality.
However, producing rubrics that are both reliable and scalable remains a key
challenge. In this work, we introduce OpenRubrics, a diverse, large-scale
collection of (prompt, rubric) pairs for training rubric-generation and
rubric-based reward models. To elicit discriminative and comprehensive
evaluation signals, we introduce Contrastive Rubric Generation (CRG), which
derives both hard rules (explicit constraints) and principles (implicit
qualities) by contrasting preferred and rejected responses. We further improve
reliability by enforcing preference-label consistency via rejection sampling to
remove noisy rubrics. Across multiple reward-modeling benchmarks, our
rubric-based reward model, Rubric-RM, surpasses strong size-matched baselines
by 6.8%. These gains transfer to policy models on instruction-following and
biomedical benchmarks. Our results show that rubrics provide scalable alignment
signals that narrow the gap between costly human evaluation and automated
reward modeling, enabling a new principle-driven paradigm for LLM alignment.