Google DeepMind Releases Gemini 2.5 Computer Use Model

Google DeepMind has released the Gemini 2.5 Computer Use model, a specialised version of its Gemini 2.5 Pro AI that can interact with user interfaces. The model is available in preview via the Gemini API through Google AI Studio and Vertex AI Studio.

The model allows AI agents to complete tasks by interacting directly with graphical interfaces, such as filling forms, clicking buttons, scrolling and operating behind logins.

“The ability to natively fill out forms, manipulate interactive elements like dropdowns and filters and operate behind logins is a crucial next step in building powerful, general-purpose agents,” the company said.

Developers access the model through the computer-use tool, which operates in a loop. Inputs include the user request, a screenshot of the environment and a history of recent actions. The model generates responses in the form of UI actions, which are executed by client-side code. The loop continues with updated screenshots and context until the task is completed or terminated.

The model is optimised for web browsers and shows potential for mobile UI control, but is not yet designed for desktop operating system-level tasks. Demonstrations include transferring pet-care data to a CRM system and organising digital sticky notes into categories.

Gemini 2.5 Computer Use has demonstrated strong performance on web and mobile control benchmarks, including Online-Mind2Web, WebVoyager and AndroidWorld. According to DeepMind, the model delivers “high accuracy while maintaining low latency”, with accuracy above 70% and latency around 225 seconds.

Google DeepMind emphasised safety, noting that AI agents controlling computers carry risks such as misuse, unexpected behaviour and web-based scams.

The company said it has integrated safety features into the model and provides developers with controls to prevent harmful actions. “Developers can further specify that the agent either refuses or asks for user confirmation before it takes specific kinds of high-stakes actions,” DeepMind said.

Source link

What's Hot

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

DeepSeek AI Tips Remittix As The Best Crypto To Buy Now

Google DeepMind Releases Gemini 2.5 Computer Use Model

Google DeepMind’s New AI Agent Finds and Fixes Vulnerabilities

Google DeepMind’s Gemini Agent : Autonomous Al Coding Agent

The Nobel Prize in chemistry will be announced Wednesday

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

DeepSeek AI Tips Remittix As The Best Crypto To Buy Now

What's Hot

Google DeepMind Releases Gemini 2.5 Computer Use Model

Related Posts

Subscribe to Updates