Autonomous agents for Graphical User Interfaces (GUIs) face significant
challenges in specialized domains such as scientific computing, where both
long-horizon planning and precise execution are required. Existing approaches
suffer from a trade-off: generalist agents excel at planning but perform poorly
in execution, while specialized agents demonstrate the opposite weakness.
Recent compositional frameworks attempt to bridge this gap by combining a
planner and an actor, but they are typically static and non-trainable, which
prevents adaptation from experience. This is a critical limitation given the
scarcity of high-quality data in scientific domains. To address these
limitations, we introduce CODA, a novel and trainable compositional framework
that integrates a generalist planner (Cerebrum) with a specialist executor
(Cerebellum), trained via a dedicated two-stage pipeline. In the first stage,
Specialization, we apply a decoupled GRPO approach to train an expert planner
for each scientific application individually, bootstrapping from a small set of
task trajectories. In the second stage, Generalization, we aggregate all
successful trajectories from the specialized experts to build a consolidated
dataset, which is then used for supervised fine-tuning of the final planner.
This equips CODA with both robust execution and cross-domain generalization.
Evaluated on four challenging applications from the ScienceBoard benchmark,
CODA significantly outperforms baselines and establishes a new state of the art
among open-source models.