Paper Page - Transformer Copilot: Learning From The Mistake Log In LLM Fine-tuning

The Transformer Copilot framework enhances large language model performance through a Copilot model that refines the Pilot’s logits based on a Mistake Log, leading to consistent performance improvements across various benchmarks.

Large language models are typically adapted to downstream tasks through
supervised fine-tuning on domain-specific data. While standard fine-tuning
focuses on minimizing generation loss to optimize model parameters, we take a
deeper step by retaining and leveraging the model’s own learning signals,
analogous to how human learners reflect on past mistakes to improve future
performance. We first introduce the concept of Mistake Log to systematically
track the model’s learning behavior and recurring errors throughout
fine-tuning. Treating the original transformer-based model as the Pilot, we
correspondingly design a Copilot model to refine the Pilot’s inference
performance via logits rectification. We name the overall Pilot-Copilot
framework the Transformer Copilot, which introduces (i) a novel Copilot model
design, (ii) a joint training paradigm where the Copilot continuously learns
from the evolving Mistake Log alongside the Pilot, and (iii) a fused inference
paradigm where the Copilot rectifies the Pilot’s logits for enhanced
generation. We provide both theoretical and empirical analyses on our new
learning framework. Experiments on 12 benchmarks spanning commonsense,
arithmetic, and recommendation tasks demonstrate that Transformer Copilot
consistently improves performance by up to 34.5%, while introducing marginal
computational overhead to Pilot models and exhibiting strong scalability and
transferability.

Source link

What's Hot

Creating a Thinking Multimodal Creative Engine_and_model_image

The Hybrid AI Law Firm – Artificial Lawyer

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Paper page – Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning – Takara TLDR

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search – Takara TLDR

Christie’s Will Auction The First Calculating Machine In History

The Art Market Isn’t Dying. The Way We Write About It Might Be.

Banksy Mural of Judge Beating Protestor Removed by Courts Service

Ralph Rugoff to Leave London’s Hayward Gallery After 20 Years

Creating a Thinking Multimodal Creative Engine_and_model_image

The Hybrid AI Law Firm – Artificial Lawyer

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

What's Hot

Paper page – Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning

Related Posts

Subscribe to Updates