Beyond Log Likelihood: Probability-Based Objectives For Supervised Fine-Tuning Across The Model Capability Continuum - Takara TLDR

Supervised fine-tuning (SFT) is the standard approach for post-training large
language models (LLMs), yet it often shows limited generalization. We trace
this limitation to its default training objective: negative log likelihood
(NLL). While NLL is classically optimal when training from scratch,
post-training operates in a different paradigm and could violate its optimality
assumptions, where models already encode task-relevant priors and supervision
can be long and noisy. To this end, we study a general family of
probability-based objectives and characterize their effectiveness under
different conditions. Through comprehensive experiments and extensive ablation
studies across 7 model backbones, 14 benchmarks, and 3 domains, we uncover a
critical dimension that governs objective behavior: the model-capability
continuum. Near the model-strong end, prior-leaning objectives that downweight
low-probability tokens (e.g., $-p$, $-p^{10}$, thresholded variants)
consistently outperform NLL; toward the model-weak end, NLL dominates; in
between, no single objective prevails. Our theoretical analysis further
elucidates how objectives trade places across the continuum, providing a
principled foundation for adapting objectives to model capability. Our code is
available at https://github.com/GaotangLi/Beyond-Log-Likelihood.

Source link

What's Hot

CISO Conversations: John ‘Four’ Flynn, VP of Security and Privacy at Google DeepMind

Perplexity’s Comet AI browser now free; Max users get new ‘background assistant’

Rethinking Reward Models for Multi-Domain Test-Time Scaling – Takara TLDR

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum – Takara TLDR

Rethinking Reward Models for Multi-Domain Test-Time Scaling – Takara TLDR

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness – Takara TLDR

On Predictability of Reinforcement Learning Dynamics for Large Language Models – Takara TLDR

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Outsider Art Fair’s New Director Elizabeth Denny Discusses Her Role

50 Pianos Sound Off in ’11,000 Strings’ at the Park Avenue Armory

Five Arts and Culture Nonprofits Join NYC’s Cultural Institutions Group

CISO Conversations: John ‘Four’ Flynn, VP of Security and Privacy at Google DeepMind

Perplexity’s Comet AI browser now free; Max users get new ‘background assistant’

Rethinking Reward Models for Multi-Domain Test-Time Scaling – Takara TLDR

What's Hot

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum – Takara TLDR

Related Posts

Subscribe to Updates