CODA: Coordinating The Cerebrum And Cerebellum For A Dual-Brain Computer Use Agent With Decoupled Reinforcement Learning - Takara TLDR

Autonomous agents for Graphical User Interfaces (GUIs) face significant
challenges in specialized domains such as scientific computing, where both
long-horizon planning and precise execution are required. Existing approaches
suffer from a trade-off: generalist agents excel at planning but perform poorly
in execution, while specialized agents demonstrate the opposite weakness.
Recent compositional frameworks attempt to bridge this gap by combining a
planner and an actor, but they are typically static and non-trainable, which
prevents adaptation from experience. This is a critical limitation given the
scarcity of high-quality data in scientific domains. To address these
limitations, we introduce CODA, a novel and trainable compositional framework
that integrates a generalist planner (Cerebrum) with a specialist executor
(Cerebellum), trained via a dedicated two-stage pipeline. In the first stage,
Specialization, we apply a decoupled GRPO approach to train an expert planner
for each scientific application individually, bootstrapping from a small set of
task trajectories. In the second stage, Generalization, we aggregate all
successful trajectories from the specialized experts to build a consolidated
dataset, which is then used for supervised fine-tuning of the final planner.
This equips CODA with both robust execution and cross-domain generalization.
Evaluated on four challenging applications from the ScienceBoard benchmark,
CODA significantly outperforms baselines and establishes a new state of the art
among open-source models.

Source link

What's Hot

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

Anthropic on using Claude user data for training AI: Privacy policy explained

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning – Takara TLDR

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

Collaborative Multi-Modal Coding for High-Quality 3D Generation – Takara TLDR

Self-Rewarding Vision-Language Model via Reasoning Decomposition – Takara TLDR

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Sotheby’s to Launch First Series of Luxury Auctions in Abu Dhabi

Nazi-Looted Painting Turns Up in Argentinean Real Estate Listing

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

DeepSeek Fuels Return to Profit for Chinese Tech Champion Huawei

Anthropic on using Claude user data for training AI: Privacy policy explained

What's Hot

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning – Takara TLDR

Related Posts

Subscribe to Updates