Paper Page - Teach Old SAEs New Domain Tricks With Boosting

A residual learning approach enhances Sparse Autoencoders to capture domain-specific features without retraining, improving interpretability and performance on specialized domains.

Sparse Autoencoders have emerged as powerful tools for interpreting the
internal representations of Large Language Models, yet they often fail to
capture domain-specific features not prevalent in their training corpora. This
paper introduces a residual learning approach that addresses this feature
blindness without requiring complete retraining. We propose training a
secondary SAE specifically to model the reconstruction error of a pretrained
SAE on domain-specific texts, effectively capturing features missed by the
primary model. By summing the outputs of both models during inference, we
demonstrate significant improvements in both LLM cross-entropy and explained
variance metrics across multiple specialized domains. Our experiments show that
this method efficiently incorporates new domain knowledge into existing SAEs
while maintaining their performance on general tasks. This approach enables
researchers to selectively enhance SAE interpretability for specific domains of
interest, opening new possibilities for targeted mechanistic interpretability
of LLMs.

Source link

What's Hot

OpenAI GPT-5 Possesses Doctorate-Level Abilities? Google DeepMind CEO: Nonsense_the_this_Market

Why the Dongfeng Yipai eπ007 Became a Best-Seller in the 150,000 Yuan New Energy Sedan Market_The_With

Innovaccer Revenue Surges Amidst $275M AI Funding Round

Paper page – Teach Old SAEs New Domain Tricks with Boosting

Research Paper – Takara TLDR

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

OpenAI GPT-5 Possesses Doctorate-Level Abilities? Google DeepMind CEO: Nonsense_the_this_Market

Why the Dongfeng Yipai eπ007 Became a Best-Seller in the 150,000 Yuan New Energy Sedan Market_The_With

Innovaccer Revenue Surges Amidst $275M AI Funding Round

What's Hot

Paper page – Teach Old SAEs New Domain Tricks with Boosting

Related Posts

Subscribe to Updates