Paper page - EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Recent vision-language models (VLMs) show strong results on offline image and video understanding, but their performance in interactive, embodied environments remains limited. In close loop settings, an agent acts from a first-person view, where each decision alters future observations. Even leading models like GPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro struggle with spatial reasoning and long-horizon planning. We present EmbRACE-3K , a dataset of over 3,000 language-guided tasks in diverse Unreal Engine environments. Each task spans multiple steps, with egocentric views, high-level instructions, grounded actions, and natural language rationales. We benchmark VLMs on three core skills: exploration, dynamic spatial-semantic reasoning, and multi-stage goal execution. In zero-shot tests, all models achieve below 20 percent success, showing clear room for improvement. Fine-tuning Qwen2.5-VL-7B with supervised and reinforcement learning leads to consistent gains across all task types, demonstrating the value of EmbRACE-3K for developing embodied intelligence.

Source link

What's Hot

Up Next in Privacy Litigation: Class Actions Begin to Target Consumer-Facing Companies Using Generative AI Tools | Insights

Talent Acquisitions Future | Recruiting News Network

You, AI, and the Brands You Love

Paper page – EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper page – LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Paper page – Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper page – MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

The Artists and Art Pros Who Donated to Cuomo and Mamdani’s Campaigns

Phillips Sues Billionaire’s Son Over $14.5 M. Pollock Painting

Murujuga Rock Art in Australia Receives UNESCO World Heritage Status

‘Earth Room’ Caretaker Dies at 70

Up Next in Privacy Litigation: Class Actions Begin to Target Consumer-Facing Companies Using Generative AI Tools | Insights

Talent Acquisitions Future | Recruiting News Network

You, AI, and the Brands You Love

What's Hot

Paper page – EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Related Posts

Subscribe to Updates