This paper introduces Flexive, a novel generative verifier, and the Solve-Detect-Verify pipeline to address the trade-off between accuracy and computational efficiency in Large Language Model (LLM) reasoning.
Flexive dynamically balances “fast thinking” (rapid, resource-efficient error diagnosis) and “slow thinking” (meticulous, computationally-intensive analysis) using a Flexible Allocation of Verification Budget strategy. This strategy first uses efficient, parallel assessments to gauge verification difficulty before escalating to deeper analysis if needed. Flexive is trained using Group Relative Policy Optimization (GRPO) for mistake detection.
The Solve-Detect-Verify pipeline integrates Flexive into an efficient inference-time scaling framework. It consists of three stages:
Solve: An LLM generates an initial solution.
Detect: A lightweight mechanism monitors the LLM’s output for hesitation keywords and uses log– probabilities to assess if a solution is complete, potentially pausing generation early.
Verify and Refine: Flexive assesses the candidate solution. If correct, it’s finalized. If errors are found, Flexive’s feedback guides the solver to generate a single new, refined solution.