Paper Page - ComfyUI-R1: Exploring Reasoning Models For Workflow Generation

ComfyUI-R1, a large reasoning model for automated workflow generation, demonstrates superior performance in creating AI art workflows through long chain-of-thought reasoning and reinforcement learning.

AI-generated content has evolved from monolithic models to modular workflows,
particularly on platforms like ComfyUI, enabling customization in creative
pipelines. However, crafting effective workflows requires great expertise to
orchestrate numerous specialized components, presenting a steep learning curve
for users. To address this challenge, we introduce ComfyUI-R1, the first large
reasoning model for automated workflow generation. Starting with our curated
dataset of 4K workflows, we construct long chain-of-thought (CoT) reasoning
data, including node selection, workflow planning, and code-level workflow
representation. ComfyUI-R1 is trained through a two-stage framework: (1) CoT
fine-tuning for cold start, adapting models to the ComfyUI domain; (2)
reinforcement learning for incentivizing reasoning capability, guided by a
fine-grained rule-metric hybrid reward, ensuring format validity, structural
integrity, and node-level fidelity. Experiments show that our 7B-parameter
model achieves a 97\% format validity rate, along with high pass rate,
node-level and graph-level F1 scores, significantly surpassing prior
state-of-the-art methods that employ leading closed-source models such as
GPT-4o and Claude series. Further analysis highlights the critical role of the
reasoning process and the advantage of transforming workflows into code.
Qualitative comparison reveals our strength in synthesizing intricate workflows
with diverse nodes, underscoring the potential of long CoT reasoning in AI art
creation.

Source link

What's Hot

Google’s AI Mode arrives in Spanish globally

France’s Mistral AI plans expansion into Canada

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models – Takara TLDR

Paper page – ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models – Takara TLDR

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

Qwen3-Omni Technical Report – Takara TLDR

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Google’s AI Mode arrives in Spanish globally

France’s Mistral AI plans expansion into Canada

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models – Takara TLDR

What's Hot

Paper page – ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Related Posts

Subscribe to Updates