Paper Page - Discovering Hierarchical Latent Capabilities Of Language Models Via Causal Representation Learning

A causal representation learning framework identifies a concise causal structure to explain performance variations in language models across benchmarks by controlling for base model variations.

Faithful evaluation of language model capabilities is crucial for deriving
actionable insights that can inform model development. However, rigorous causal
evaluations in this domain face significant methodological challenges,
including complex confounding effects and prohibitive computational costs
associated with extensive retraining. To tackle these challenges, we propose a
causal representation learning framework wherein observed benchmark performance
is modeled as a linear transformation of a few latent capability factors.
Crucially, these latent factors are identified as causally interrelated after
appropriately controlling for the base model as a common confounder. Applying
this approach to a comprehensive dataset encompassing over 1500 models
evaluated across six benchmarks from the Open LLM Leaderboard, we identify a
concise three-node linear causal structure that reliably explains the observed
performance variations. Further interpretation of this causal structure
provides substantial scientific insights beyond simple numerical rankings:
specifically, we reveal a clear causal direction starting from general
problem-solving capabilities, advancing through instruction-following
proficiency, and culminating in mathematical reasoning ability. Our results
underscore the essential role of carefully controlling base model variations
during evaluation, a step critical to accurately uncovering the underlying
causal relationships among latent model capabilities.

Source link

What's Hot

European Commission Outlines New Strategies for AI and Science – Fintech Schweiz Digital Finance News

Operator Bell begins Cohere AI rollout

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

Paper page – Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models – Takara TLDR

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints – Takara TLDR

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation – Takara TLDR

Frieze to Launch Abu Dhabi Fair in November 2026

Jeff Koons Returns to Gagosian with First New York Show in Seven Years

$45 M. Basquait Painting to Headline Sotheby’s Fall Sales in New York

Guggenheim’s 2026 Shows Include Carol Bove Survey, Taryn Simon Project

European Commission Outlines New Strategies for AI and Science – Fintech Schweiz Digital Finance News

Operator Bell begins Cohere AI rollout

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

What's Hot

Paper page – Discovering Hierarchical Latent Capabilities of Language Models via Causal Representation Learning

Related Posts

Subscribe to Updates