Paper Page - TrustGeoGen: Scalable And Formal-Verified Data Engine For Trustworthy Multi-modal Geometric Problem Solving

Mathematical geometric problem solving (GPS) often requires effective integration of multimodal information and verifiable logical coherence. Despite the fast development of large language models in general problem solving, it remains unresolved regarding with both methodology and benchmarks, especially given the fact that exiting synthetic GPS benchmarks are often not self-verified and contain noise and self-contradicted information due to the illusion of LLMs. In this paper, we propose a scalable data engine called TrustGeoGen for problem generation, with formal verification to provide a principled benchmark, which we believe lays the foundation for the further development of methods for GPS. The engine synthesizes geometric data through four key innovations: 1) multimodal-aligned generation of diagrams, textual descriptions, and stepwise solutions; 2) formal verification ensuring rule-compliant reasoning paths; 3) a bootstrapping mechanism enabling complexity escalation via recursive state generation and 4) our devised GeoExplore series algorithms simultaneously produce multi-solution variants and self-reflective backtracking traces. By formal logical verification, TrustGeoGen produces GeoTrust-200K dataset with guaranteed modality integrity, along with GeoTrust-test testset. Experiments reveal the state-of-the-art models achieve only 49.17% accuracy on GeoTrust-test, demonstrating its evaluation stringency. Crucially, models trained on GeoTrust achieve OOD generalization on GeoQA, significantly reducing logical inconsistencies relative to pseudo-label annotated by OpenAI-o1.

Source link

What's Hot

OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT

IBM integrates Anthropic Claude into AI IDE and other tools

Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents

Paper page – TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Alignment Tipping Process: How Self-Evolution Pushes LLM Agents Off the Rails – Takara TLDR

Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training – Takara TLDR

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models – Takara TLDR

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

What the Los Angeles Wildfires Taught the Art Insurance Industry

Musée d’Orsay Puts Manet on (Mock) Trial for Obscenity

OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT

IBM integrates Anthropic Claude into AI IDE and other tools

Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents

What's Hot

Paper page – TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

Related Posts

Subscribe to Updates