TaTToo: Tool-Grounded Thinking PRM For Test-Time Scaling In Tabular Reasoning - Takara TLDR

Process Reward Models (PRMs) have recently emerged as a powerful framework
for enhancing the reasoning capabilities of large reasoning models (LRMs),
particularly in the context of test-time scaling (TTS). However, their
potential for supervising LRMs on tabular reasoning domains remains
underexplored. Through detailed empirical analyses, we identify that existing
PRMs, though widely adopted for supervising text-only reasoning steps, struggle
with table-specific operations such as sub-table retrieval and schema
interaction, leading to critical performance bottlenecks. To address this
limitation, we propose TaTToo, a novel table-grounded PRM framework that (i)
reasons explicitly over tabular reasoning steps and (ii) integrates tool-based
verification to provide precise reward supervision. Concretely, we first design
a scalable data curation pipeline that constructs over 60k high-quality
step-level annotations by integrating table verification rationales with
tool-based executions. Building on the collected data, we train TaTToo with a
dual-stage paradigm: cold-start supervised fine-tuning to capture tool-use
reasoning patterns, followed by reinforcement learning with tool-grounded
reward shaping to align our model with table-based verification. We provide a
comprehensive evaluation of the policy improvement induced by our newly
designed PRM. Across 5 challenging tabular reasoning benchmarks covering
numerical reasoning, fact-checking, and data analysis, TaTToo improves
downstream policy LRMs by 30.9% at inference, surpasses strong PRM baselines
such as Qwen-2.5-Math-PRM-72B with only 8B parameters, and demonstrates strong
generalizability across diverse TTS strategies.

Source link

What's Hot

Alibaba’s Qwen Technology Lead Sets Up In-House Robot AI Team

Google DeepMind Releases Gemini 2.5 Computer Use Model

A busy week for OpenAI’s social video machine.

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning – Takara TLDR

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark – Takara TLDR

Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where? – Takara TLDR

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models – Takara TLDR

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

Alibaba’s Qwen Technology Lead Sets Up In-House Robot AI Team

Google DeepMind Releases Gemini 2.5 Computer Use Model

A busy week for OpenAI’s social video machine.

What's Hot

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning – Takara TLDR

Related Posts

Subscribe to Updates