Paper Page - How To Train Your LLM Web Agent: A Statistical Diagnosis

LLM-based web agents have recently made significant progress, but much of it has
occurred in closed-source systems—widening the gap with open-source alternatives.
Progress has been held back by two key challenges—first, a narrow focus on single-
step tasks that overlooks the complexity of multi-step web interactions, and second,
the high compute costs required to post-train LLM-based web agents. To address
this, we present the first statistically grounded study on compute allocation for
LLM web-agent post-training. Our approach uses a two-stage pipeline, training
a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via SFT, followed
by on-policy reinforcement learning. We find this process highly sensitive to
hyperparameter choices — exhaustive sweeps are impractical. To spare others from
expensive trial-and-error, we sample 1,370 configurations and use bootstrapping
to estimate effective hyperparameters. Our results show that combining SFT with
on-policy RL consistently outperforms either approach alone on both WorkArena
and MiniWob++. Further, this strategy only requires 55% of the compute to match
the peak of pure SFT on MiniWob++, pushing the compute–performance Pareto
frontier and is the only strategy that can close the gap with closed-source models.

Source link

What's Hot

Data Reveals AI Search Dominance Is False Narrative, So Far 08/28/2025

Nvidia says two mystery customers accounted for 39% of Q2 revenue

Elon Musk’s xAI Hits Ex-Employee With Lawsuit Claiming Trade Secrets Ended Up At OpenAI

Paper page – How to Train Your LLM Web Agent: A Statistical Diagnosis

rStar2-Agent: Agentic Reasoning Technical Report – Takara TLDR

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning – Takara TLDR

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection – Takara TLDR

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News

Data Reveals AI Search Dominance Is False Narrative, So Far 08/28/2025

Nvidia says two mystery customers accounted for 39% of Q2 revenue

Elon Musk’s xAI Hits Ex-Employee With Lawsuit Claiming Trade Secrets Ended Up At OpenAI

What's Hot

Paper page – How to Train Your LLM Web Agent: A Statistical Diagnosis

Related Posts

Subscribe to Updates