Paper page - UFO2: The Desktop AgentOS

Recent Computer-Using Agents (CUAs), powered by multimodal large language
models (LLMs), offer a promising direction for automating complex desktop
workflows through natural language. However, most existing CUAs remain
conceptual prototypes, hindered by shallow OS integration, fragile
screenshot-based interaction, and disruptive execution.
We present UFO2, a multiagent AgentOS for Windows desktops that elevates CUAs
into practical, system-level automation. UFO2 features a centralized HostAgent
for task decomposition and coordination, alongside a collection of
application-specialized AppAgent equipped with native APIs, domain-specific
knowledge, and a unified GUI–API action layer. This architecture enables
robust task execution while preserving modularity and extensibility. A hybrid
control detection pipeline fuses Windows UI Automation (UIA) with vision-based
parsing to support diverse interface styles. Runtime efficiency is further
enhanced through speculative multi-action planning, reducing per-step LLM
overhead. Finally, a Picture-in-Picture (PiP) interface enables automation
within an isolated virtual desktop, allowing agents and users to operate
concurrently without interference.
We evaluate UFO2 across over 20 real-world Windows applications,
demonstrating substantial improvements in robustness and execution accuracy
over prior CUAs. Our results show that deep OS integration unlocks a scalable
path toward reliable, user-aligned desktop automation.

Source link

What's Hot

AI Race Gets Hotter: Meta, Google, OpenAI battle it out to recruit talents — And they are offering millions! Here’s why

Shadow AI and Poor Governance Linked to Costlier Breaches

Your public ChatGPT queries are getting indexed by Google and other search engines

Paper page – UFO2: The Desktop AgentOS

Paper page – NeRF Is a Valuable Assistant for 3D Gaussian Splatting

Paper page – ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper page – Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Theatre Director and Artist Dies at 83

France to Accelerate Return of Looted Artworks—and More Art News

Person Dies After Jumping from Whitney Museum

At Aspen Art Week, Bigger Fairs Make for a High-Altitude Market Bet

AI Race Gets Hotter: Meta, Google, OpenAI battle it out to recruit talents — And they are offering millions! Here’s why

Shadow AI and Poor Governance Linked to Costlier Breaches

Your public ChatGPT queries are getting indexed by Google and other search engines

What's Hot

Paper page – UFO2: The Desktop AgentOS

Related Posts

Subscribe to Updates