Improving Rationality In The Reasoning Process Of Language Models Through Self-playing Game

arXiv:2506.22920v1 Announce Type: new
Abstract: Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We design a Critic-Discernment Game(CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution. These critiques either aim to assist or mislead the prover. The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback. Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain reasoning demonstrate that CDG training can significantly improve the ability of well-aligned LLMs to comprehend their reasoning process.

Source link

What's Hot

AI Model Learns to ‘Act Accordingly’, Opening a New Era of Adaptive AI_model_The_this

Anthropic’s Claude restrictions put overseas AI tools backed by China in limbo

I asked ChatGPT-5 vs Claude to script the next sci-fi blockbuster — this is the one I’d pay to watch

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

AI Model Learns to ‘Act Accordingly’, Opening a New Era of Adaptive AI_model_The_this

Anthropic’s Claude restrictions put overseas AI tools backed by China in limbo

I asked ChatGPT-5 vs Claude to script the next sci-fi blockbuster — this is the one I’d pay to watch

What's Hot

Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

Related Posts

Subscribe to Updates