Paper page - M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Effective reasoning is crucial to solving complex mathematical problems.
Recent large language models (LLMs) have boosted performance by scaling
test-time computation through long chain-of-thought reasoning. However,
transformer-based models are inherently limited in extending context length due
to their quadratic computational complexity and linear memory requirements. In
this paper, we introduce a novel hybrid linear RNN reasoning model, M1, built
on the Mamba architecture, which allows memory-efficient inference. Our
approach leverages a distillation process from existing reasoning models and is
further enhanced through RL training. Experimental results on the AIME and MATH
benchmarks show that M1 not only outperforms previous linear RNN models but
also matches the performance of state-of-the-art Deepseek R1 distilled
reasoning models at a similar scale. We also compare our generation speed with
a highly performant general purpose inference engine, vLLM, and observe more
than a 3x speedup compared to a same size transformer. With throughput speedup,
we are able to achieve higher accuracy compared to DeepSeek R1 distilled
transformer reasoning models under a fixed generation time budget using
self-consistency voting. Overall, we introduce a hybrid Mamba reasoning model
and provide a more effective approach to scaling test-time generation using
self-consistency or long chain of thought reasoning.

Source link

What's Hot

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

Trump’s Tech Sanctions To Empower China, Betray America

Paper page – FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Paper page – M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper page – FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Paper page – MARBLE: Material Recomposition and Blending in CLIP-Space

Paper page – SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Men’s Swimwear Gets Casual At Miami Swim Week 2025

Original Prototype for Jane Birkin’s Hermes Bag Consigned to Sotheby’s

Viral Trump Vs. Musk Feud Ignites A Meme Chain Reaction

UK Art Dealer Sentenced To 2.5 Years In Jail For Selling Art to Suspected Hezbollah Financier

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

Trump’s Tech Sanctions To Empower China, Betray America

Paper page – FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

What's Hot

Paper page – M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Related Posts

Subscribe to Updates