Paper Page - SeerAttention-R: Sparse Attention Adaptation For Long Reasoning

SeerAttention-R is a sparse attention framework for reasoning models that maintains high accuracy and achieves significant speedups through optimized sparse decoding kernels.

We introduce SeerAttention-R, a sparse attention framework specifically
tailored for the long decoding of reasoning models. Extended from
SeerAttention, SeerAttention-R retains the design of learning attention
sparsity through a self-distilled gating mechanism, while removing query
pooling to accommodate auto-regressive decoding. With a lightweight plug-in
gating, SeerAttention-R is flexible and can be easily integrated into existing
pretrained model without modifying the original parameters. We demonstrate that
SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning
accuracy with 4K token budget in AIME benchmark under large sparse attention
block sizes (64/128). Using TileLang, we develop a highly optimized sparse
decoding kernel that achieves near-theoretical speedups of up to 9x over
FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at:
https://github.com/microsoft/SeerAttention.

Source link

What's Hot

How to Use Perplexity AI to Boost Productivity and Simplify Life

Centari, Alexi, Legau, LegalOn Webinar, LexisNexis + – Artificial Lawyer

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction – Takara TLDR

Paper page – SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction – Takara TLDR

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching – Takara TLDR

Mano Report – Takara TLDR

Art Dealer Mary Boone Says Prison Was ‘Very Relaxing’

New Research Supports Theory of Hidden Vermeer Self-Portrait

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

How to Use Perplexity AI to Boost Productivity and Simplify Life

Centari, Alexi, Legau, LegalOn Webinar, LexisNexis + – Artificial Lawyer

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction – Takara TLDR

What's Hot

Paper page – SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Related Posts

Subscribe to Updates