Paper Page - ScreenCoder: Advancing Visual-to-Code Generation For Front-End Automation Via Modular Multimodal Agents

A modular multi-agent framework improves UI-to-code generation by integrating vision-language models, hierarchical layout planning, and adaptive prompt-based synthesis, achieving state-of-the-art performance.

Automating the transformation of user interface (UI) designs into front-end
code holds significant promise for accelerating software development and
democratizing design workflows. While recent large language models (LLMs) have
demonstrated progress in text-to-code generation, many existing approaches rely
solely on natural language prompts, limiting their effectiveness in capturing
spatial layout and visual design intent. In contrast, UI development in
practice is inherently multimodal, often starting from visual sketches or
mockups. To address this gap, we introduce a modular multi-agent framework that
performs UI-to-code generation in three interpretable stages: grounding,
planning, and generation. The grounding agent uses a vision-language model to
detect and label UI components, the planning agent constructs a hierarchical
layout using front-end engineering priors, and the generation agent produces
HTML/CSS code via adaptive prompt-based synthesis. This design improves
robustness, interpretability, and fidelity over end-to-end black-box methods.
Furthermore, we extend the framework into a scalable data engine that
automatically produces large-scale image-code pairs. Using these synthetic
examples, we fine-tune and reinforce an open-source VLM, yielding notable gains
in UI understanding and code quality. Extensive experiments demonstrate that
our approach achieves state-of-the-art performance in layout accuracy,
structural coherence, and code correctness. Our code is made publicly available
at https://github.com/leigest519/ScreenCoder.

Source link

What's Hot

C3.AI INVESTOR DEADLINE ON OCTOBER 21, 2025 to Lead in Securities Fraud Lawsuit – Contact Kaplan Fox

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents – Takara TLDR

France political crisis worries tech sector

Paper page – ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents – Takara TLDR

Hybrid Reinforcement: When Reward Is Sparse, It’s Better to Be Dense – Takara TLDR

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations – Takara TLDR

Smithsonian Closes Museums Amid Government Shutdown

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

C3.AI INVESTOR DEADLINE ON OCTOBER 21, 2025 to Lead in Securities Fraud Lawsuit – Contact Kaplan Fox

NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents – Takara TLDR

France political crisis worries tech sector

What's Hot

Paper page – ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Related Posts

Subscribe to Updates