Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems In Math Competitions

arXiv:2505.18492v1 Announce Type: new
Abstract: Mathematical reasoning lies at the heart of artificial intelligence, underpinning applications in education, program verification, and research-level mathematical discovery. Mathematical competitions, in particular, present two challenging problem types: theorem-proving, requiring rigorous proofs of stated conclusions, and answer-construction, involving hypothesizing and formally verifying mathematical objects. Large Language Models (LLMs) effectively generate creative candidate answers but struggle with formal verification, while symbolic provers ensure rigor but cannot efficiently handle creative conjecture generation. We introduce the Enumerate-Conjecture-Prove (ECP) framework, a modular neuro-symbolic method integrating LLM-based enumeration and pattern-driven conjecturing with formal theorem proving. We present ConstructiveBench, a dataset of 3,431 answer-construction problems in various math competitions with verified Lean formalizations. On the ConstructiveBench dataset, ECP improves the accuracy of answer construction from the Chain-of-Thought (CoT) baseline of 14.54% to 45.06% with the gpt-4.1-mini model. Moreover, combining with ECP’s constructed answers, the state-of-the-art DeepSeek-Prover-V2-7B model generates correct proofs for 858 of the 3,431 constructive problems in Lean, achieving 25.01% accuracy, compared to 9.86% for symbolic-only baselines. Our code and dataset are publicly available at GitHub and HuggingFace, respectively.

Source link

What's Hot

Which AI Powerhouse Should You Buy Now?

QBTS in Focus Amid Quantum Launches, Competition With IBM, HON – September 10, 2025

Cisco Bets on Splunk to Activate Machine Data for AI With New Data Fabric

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

Which AI Powerhouse Should You Buy Now?

QBTS in Focus Amid Quantum Launches, Competition With IBM, HON – September 10, 2025

Cisco Bets on Splunk to Activate Machine Data for AI With New Data Fabric

What's Hot

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions

Related Posts

Subscribe to Updates