Arrow-Guided VLM: Enhancing Flowchart Understanding Via Arrow Direction Encoding

arXiv:2505.07864v1 Announce Type: new
Abstract: Flowcharts are indispensable tools in software design and business-process analysis, yet current vision-language models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural images. We introduce a seven-stage pipeline grouped into three broader processes: (1) arrow-aware detection of nodes and arrow endpoints; (2) optical character recognition (OCR) to extract node text; and (3) construction of a structured prompt that guides the VLMs. Tested on a 90-question benchmark distilled from 30 annotated flowcharts, the method raises overall accuracy from 80 % to 89 % (+9 percentage points) without any task-specific fine-tuning. The gain is most pronounced for next-step queries (25/30 -> 30/30; 100 %, +17 pp); branch-result questions improve more modestly, and before-step questions remain difficult. A parallel evaluation with an LLM-as-a-Judge protocol shows the same trends, reinforcing the advantage of explicit arrow encoding. Limitations include dependence on detector and OCR precision, the small evaluation set, and residual errors at nodes with multiple incoming edges. Future work will enlarge the benchmark with synthetic and handwritten flowcharts and assess the approach on Business Process Model and Notation (BPMN) and Unified Modeling Language (UML).

Source link

What's Hot

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

Promoting the Implementation of Artificial Intelligence Technology in Manufacturing Scenarios_as_its_Tuo

Stability AI launches Stable audio 2.5 to create instant enterprise soundtracks

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

Promoting the Implementation of Artificial Intelligence Technology in Manufacturing Scenarios_as_its_Tuo

Stability AI launches Stable audio 2.5 to create instant enterprise soundtracks

What's Hot

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

Related Posts

Subscribe to Updates