Paper Page - Training Step-Level Reasoning Verifiers With Formal Verification Tools

FoVer is a method for automatically annotating step-level error labels using formal verification tools to train Process Reward Models, which significantly improves cross-task generalization and outperforms human-annotated methods in various reasoning benchmarks.

Process Reward Models (PRMs), which provide step-by-step feedback on the
reasoning generated by Large Language Models (LLMs), are receiving increasing
attention. However, two key research gaps remain: collecting accurate
step-level error labels for training typically requires costly human
annotation, and existing PRMs are limited to math reasoning problems. In
response to these gaps, this paper aims to address the challenges of automatic
dataset creation and the generalization of PRMs to diverse reasoning tasks. To
achieve this goal, we propose FoVer, an approach for training PRMs on
step-level error labels automatically annotated by formal verification tools,
such as Z3 for formal logic and Isabelle for theorem proof, which provide
automatic and accurate verification for symbolic tasks. Using this approach, we
synthesize a training dataset with error labels on LLM responses for formal
logic and theorem proof tasks without human annotation. Although this data
synthesis is feasible only for tasks compatible with formal verification, we
observe that LLM-based PRMs trained on our dataset exhibit cross-task
generalization, improving verification across diverse reasoning tasks.
Specifically, PRMs trained with FoVer significantly outperform baseline PRMs
based on the original LLMs and achieve competitive or superior results compared
to state-of-the-art PRMs trained on labels annotated by humans or stronger
models, as measured by step-level verification on ProcessBench and Best-of-K
performance across 12 reasoning benchmarks, including MATH, AIME, ANLI, MMLU,
and BBH. The datasets, models, and code are provided at
https://github.com/psunlpgroup/FoVer.

Source link

What's Hot

What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)

UAE Releases ‘Fastest Inference Model’ Named Kimi, Based on Alibaba’s Qwen and Utilizing the World’s Largest Chip_Cheng_model_Things

Dutch chipmaker is investing $1.5B in French AI firm

Paper page – Training Step-Level Reasoning Verifiers with Formal Verification Tools

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

Reinforcement Learning Foundations for Deep Research Systems: A Survey – Takara TLDR

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? – Takara TLDR

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)

UAE Releases ‘Fastest Inference Model’ Named Kimi, Based on Alibaba’s Qwen and Utilizing the World’s Largest Chip_Cheng_model_Things

Dutch chipmaker is investing $1.5B in French AI firm

What's Hot

Paper page – Training Step-Level Reasoning Verifiers with Formal Verification Tools

Related Posts

Subscribe to Updates