Paper Page - Test-Time Scaling With Reflective Generative Model

We introduce MetaStone-S1, a pioneering reflective generative model designed to significantly enhance test-time scaling (TTS) capabilities through the new reflective generative form. This work provides three major contributions:

Reflective Generative Form: By sharing backbone between the policy and process reward model(PRM), we develop a unified interface that efficiently integrates reasoning and evaluation processes, introducing only 53M parameters’ PRM for efficient inference.
Self-supervised Process Reward Model: We introduce a novel self-supervised learning strategy that dynamically assigns outcome rewards to individual reasoning steps without the need of process-level annotations.
Scaling Law and aha-moment: We empirically demonstrate the scaling law between reasoning computation and TTS performance, and find the aha-moment of the Reflective Generative Form. Extensive evaluations on benchmarks such as AIME24, AIME25, LiveCodeBench, and C-EVAL show that MetaStone-S1 consistently achieves state-of-the-art performance compared to larger open-source and closed-source models.
To foster community-driven research, we have open-sourced MetaStone-S1. Code, models, and resources are available at https://github.com/MetaStone-AI/MetaStone-S1.

Source link

What's Hot

‘We have some new technology’

New AI Finally Solved The Hardest Animation Problem!

Here’s Why Elon Musk Is Suing OpenAI… Again

Paper page – Test-Time Scaling with Reflective Generative Model

rStar2-Agent: Agentic Reasoning Technical Report – Takara TLDR

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning – Takara TLDR

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection – Takara TLDR

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News

‘We have some new technology’

New AI Finally Solved The Hardest Animation Problem!

Here’s Why Elon Musk Is Suing OpenAI… Again

What's Hot

Paper page – Test-Time Scaling with Reflective Generative Model

Related Posts

Subscribe to Updates