Paper page - SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

SeerAttention-R is a sparse attention framework for reasoning models that maintains high accuracy and achieves significant speedups through optimized sparse decoding kernels.

We introduce SeerAttention-R, a sparse attention framework specifically
tailored for the long decoding of reasoning models. Extended from
SeerAttention, SeerAttention-R retains the design of learning attention
sparsity through a self-distilled gating mechanism, while removing query
pooling to accommodate auto-regressive decoding. With a lightweight plug-in
gating, SeerAttention-R is flexible and can be easily integrated into existing
pretrained model without modifying the original parameters. We demonstrate that
SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning
accuracy with 4K token budget in AIME benchmark under large sparse attention
block sizes (64/128). Using TileLang, we develop a highly optimized sparse
decoding kernel that achieves near-theoretical speedups of up to 9x over
FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at:
https://github.com/microsoft/SeerAttention.

Source link

What's Hot

Why open-source AI became an American national priority

From Meta’s massive offers to Anthropic’s massive valuation, does AI have a ceiling?

Talent Acquisition Strategies | Recruiting News Network

Paper page – SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Paper page – C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Paper page – Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents

Paper page – Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Blum Staffers Speak On Closure, Spiegler Slams Art ‘Financialization’

Theatre Director and Artist Dies at 83

France to Accelerate Return of Looted Artworks—and More Art News

Person Dies After Jumping from Whitney Museum

Why open-source AI became an American national priority

From Meta’s massive offers to Anthropic’s massive valuation, does AI have a ceiling?

Talent Acquisition Strategies | Recruiting News Network

What's Hot

Paper page – SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

Related Posts

Subscribe to Updates