Paper Page - Taming Polysemanticity In LLMs: Provable Feature Recovery Via Sparse Autoencoders

Existing Sparse Autoencoder (SAE) training algorithms often lack rigorous mathematical guarantees for feature recovery. Empirically, methods such as L1 regularization and TopK activation are sensitive to hyperparameter tuning and can exhibit inconsistency. Our work addresses these theoretical and practical issues with the following contributions:

📊 A novel statistical framework that rigorously formalizes feature recovery by modeling polysemantic features as sparse combinations of underlying monosemantic concepts, and establishes a precise notion of feature identifiability.

🛠️ An innovative SAE training algorithm, Group Bias Adaptation (GBA), which adaptively adjusts neural network bias parameters to enforce optimal activation sparsity, allowing distinct groups of neurons to target different activation frequencies.

🧮 The first theoretical guarantee proving that SAE training algorithm can provably recover all monosemantic features when the input data is sampled from our proposed statistical model.

🚀 Superior empirical performance on LLMs up to 1.5B parameters, where GBA achieves the best sparsity-loss trade-off while learning more consistent features than benchmark methods.

Source link

What's Hot

I’m fed up of AI chatbots replacing customer service

C3.AI DEADLINE FOR LEADERSHIP is October 21, 2025 in a Securities Fraud Lawsuit – Contact Kaplan Fox & Kilsheimer LLP

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning – Takara TLDR

Paper page – Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning – Takara TLDR

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks – Takara TLDR

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training – Takara TLDR

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

I’m fed up of AI chatbots replacing customer service

C3.AI DEADLINE FOR LEADERSHIP is October 21, 2025 in a Securities Fraud Lawsuit – Contact Kaplan Fox & Kilsheimer LLP

A^2Search: Ambiguity-Aware Question Answering with Reinforcement Learning – Takara TLDR

What's Hot

Paper page – Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

Related Posts

Subscribe to Updates