arXiv AI

Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

By Advanced AI EditorJune 25, 2025No Comments2 Mins Read

[Submitted on 27 Feb 2025 (v1), last revised 24 Jun 2025 (this version, v3)]

View a PDF of the paper titled Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models, by Yuan Sui and 5 other authors

View PDF
HTML (experimental)

Abstract:Large Language Models (LLMs) increasingly rely on prolonged reasoning chains to solve complex tasks. However, this trial-and-error approach often leads to high computational overhead and error propagation, where early mistakes can derail subsequent steps. To address these issues, we introduce Meta-Reasoner, a framework that dynamically optimizes inference-time reasoning by enabling LLMs to “think about how to think.” Drawing inspiration from human meta-cognition and dual-process theory, Meta-Reasoner operates as a strategic advisor, decoupling high-level guidance from step-by-step generation. It employs contextual multi-armed bandits to iteratively evaluate reasoning progress and select optimal strategies (e.g., backtrack, clarify ambiguity, restart from scratch, or propose alternative approaches), and reallocates computational resources toward the most promising paths. Our evaluations on mathematical reasoning and puzzles highlight the potential of dynamic reasoning chains to overcome inherent challenges in the LLM reasoning process and also show promise in broader applications, offering a scalable and adaptable solution for reasoning-intensive tasks.

Submission history

From: Yuan Sui [view email]
[v1]
Thu, 27 Feb 2025 09:40:13 UTC (1,759 KB)
[v2]
Thu, 22 May 2025 08:15:25 UTC (1,762 KB)
[v3]
Tue, 24 Jun 2025 08:27:42 UTC (1,080 KB)

Previous ArticleTesla owners take stand as Stockholm insists on blocking FSD tests

Next Article Tesla Robotaxi’s biggest challenge seems to be this one thing

Advanced AI Editor

Leave A Reply