Paper page - Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

Existing Sparse Autoencoder (SAE) training algorithms often lack rigorous mathematical guarantees for feature recovery. Empirically, methods such as L1 regularization and TopK activation are sensitive to hyperparameter tuning and can exhibit inconsistency. Our work addresses these theoretical and practical issues with the following contributions:

📊 A novel statistical framework that rigorously formalizes feature recovery by modeling polysemantic features as sparse combinations of underlying monosemantic concepts, and establishes a precise notion of feature identifiability.

🛠️ An innovative SAE training algorithm, Group Bias Adaptation (GBA), which adaptively adjusts neural network bias parameters to enforce optimal activation sparsity, allowing distinct groups of neurons to target different activation frequencies.

🧮 The first theoretical guarantee proving that SAE training algorithm can provably recover all monosemantic features when the input data is sampled from our proposed statistical model.

🚀 Superior empirical performance on LLMs up to 1.5B parameters, where GBA achieves the best sparsity-loss trade-off while learning more consistent features than benchmark methods.

Source link

What's Hot

Inside the Navy’s DoN GPT tool; Claude, Llama AI tools can now be used with sensitive data in Amazon’s government cloud

How Cursor and Claude Are Developing AI Coding Tools Together

Ancestra says a lot about the current state of AI-generated videos

Paper page – Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

Paper page – LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper page – EfficientVLA: Training-Free Acceleration and Compression for Vision-Language-Action Models

Paper page – Alignment Quality Index (AQI) : Beyond Refusals: AQI as an Intrinsic Alignment Diagnostic via Latent Geometry, Cluster Divergence, and Layer wise Pooled Representations

Audemars Piguet Contemporary and Aspen Art Museum Co-Commission Sculpture

‘Squid Game’ Front Man Actor Lee Byung-Hun Shares New BTS Details

3 AbEx Women To Know, Love And See

The World Of Sebastião Salgado At Peter Fetterman Gallery In Santa Monica

Inside the Navy’s DoN GPT tool; Claude, Llama AI tools can now be used with sensitive data in Amazon’s government cloud

How Cursor and Claude Are Developing AI Coding Tools Together

Ancestra says a lot about the current state of AI-generated videos

What's Hot

Paper page – Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders

Related Posts

Subscribe to Updates