Flow-GRPO: Training Flow Matching Models Via Online RL

arXiv:2505.05470v1 Announce Type: cross
Abstract: We propose Flow-GRPO, the first method integrating online reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model’s marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original inference timestep number, significantly improving sampling efficiency without performance degradation. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For complex compositions, RL-tuned SD3.5 generates nearly perfect object counts, spatial relations, and fine-grained attributes, boosting GenEval accuracy from $63\%$ to $95\%$. In visual text rendering, its accuracy improves from $59\%$ to $92\%$, significantly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, little to no reward hacking occurred, meaning rewards did not increase at the cost of image quality or diversity, and both remained stable in our experiments.

Source link

What's Hot

Tesla brings closure to head-scratching Cybertruck trim

OpenAI launches ‘Grove’ programme to mentor AI Entrepreneurs | Technology News

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

Flow-GRPO: Training Flow Matching Models via Online RL

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

Tesla brings closure to head-scratching Cybertruck trim

OpenAI launches ‘Grove’ programme to mentor AI Entrepreneurs | Technology News

Floating Point Precision Affects AI Model Training Effectiveness_the_number_of

What's Hot

Flow-GRPO: Training Flow Matching Models via Online RL

Related Posts

Subscribe to Updates