Ferret-UI Lite: Lessons From Building Small On-Device GUI Agents - Takara TLDR

Developing autonomous agents that effectively interact with Graphic User
Interfaces (GUIs) remains a challenging open problem, especially for small
on-device models. In this paper, we present Ferret-UI Lite, a compact,
end-to-end GUI agent that operates across diverse platforms, including mobile,
web, and desktop. Utilizing techniques optimized for developing small models,
we build our 3B Ferret-UI Lite agent through curating a diverse GUI data
mixture from real and synthetic sources, strengthening inference-time
performance through chain-of-thought reasoning and visual tool-use, and
reinforcement learning with designed rewards. Ferret-UI Lite achieves
competitive performance with other small-scale GUI agents. In GUI grounding,
Ferret-UI Lite attains scores of $91.6\%$, $53.3\%$, and $61.2\%$ on the
ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI
navigation, Ferret-UI Lite achieves success rates of $28.0\%$ on AndroidWorld
and $19.8\%$ on OSWorld. We share our methods and lessons learned from
developing compact, on-device GUI agents.

Source link

What's Hot

DataRobot + Aryn DocParse for Agentic Workflows

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain – Takara TLDR

Hype for OpenAI’s Sora Sparks a Resale Market for Invite Codes on eBay

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents – Takara TLDR

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain – Takara TLDR

OceanGym: A Benchmark Environment for Underwater Embodied Agents – Takara TLDR

Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap – Takara TLDR

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Smithsonian Museums to Remain Open Amid Government Shutdown

Statue Left Behind by Grave Robbers Unearthed in Saqqara, Egypt

Security Guards Accuse de Young Museum of Abusive Workplace Culture

DataRobot + Aryn DocParse for Agentic Workflows

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain – Takara TLDR

Hype for OpenAI’s Sora Sparks a Resale Market for Invite Codes on eBay

What's Hot

Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents – Takara TLDR

Related Posts

Subscribe to Updates