Inside Google’s AI Leap: Gemini 2.5 Thinks Deeper, Speaks Smarter And Codes Faster

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Google is moving closer to its goal of a “universal AI assistant” that can understand context, plan and take action.

Today at Google I/O, the tech giant announced enhancements to its Gemini 2.5 Flash — it’s now better across nearly every dimension, including benchmarks for reasoning, code and long context — and 2.5 Pro, including an experimental enhanced reasoning mode, ‘Deep Think,’ that allows Pro to consider multiple hypotheses before responding.

“This is our ultimate goal for the Gemini app: An AI that’s personal, proactive and powerful,” Demis Hassabis, CEO of Google DeepMind, said in a press pre-brief.

‘Deep Think’ scores impressively on top benchmarks

Google announced Gemini 2.5 Pro — what it considers its most intelligent model yet, with a one-million-token context window — in March, and released its “I/O” coding edition earlier this month (with Hassabis calling it “the best coding model we’ve ever built!”).

“We’ve been really impressed by what people have created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.

He noted that, based on Google’s experience with AlphaGo, AI model responses improve when they’re given more time to think. This led DeepMind scientists to develop Deep Think, which uses Google’s latest cutting-edge research in thinking and reasoning, including parallel techniques.

Deep Think has shown impressive scores on the hardest math and coding benchmarks, including the 2025 USA Mathematical Olympiad (USAMO). It also leads on LiveCodeBench, a difficult benchmark for competition-level coding, and scores 84.0% on MMMU, which tests multimodal understanding and reasoning.

Hassabis added, “We’re taking a bit of extra time to conduct more frontier safety evaluations and get further input from safety experts.” (Meaning: As for now, it is available to trusted testers via the API for feedback before the capability is made widely available.)

Overall, the new 2.5 Pro leads popular coding leaderboard WebDev Arena, with an ELO score — which measures the relative skill level of players in two-player games like chess — of 1420 (intermediate to proficient). It also leads across all categories of the LMArena leaderboard, which evaluates AI based on human preference.

Since its launch, “we’ve been really impressed by what [users have] created, from turning sketches into interactive apps to simulating entire cities,” said Hassabis.

Important updates to Gemini 2.5 Pro, Flash

Also today, Google announced an enhanced 2.5 Flash, considered its workhorse model designed for speed, efficiency and low cost. 2.5 Flash has been improved across the board in benchmarks for reasoning, multimodality, code and long context — Hassabis noted that it’s “second only” to 2.5 Pro on the LMArena leaderboard. The model is also more efficient, using 20 to 30% fewer tokens.

Google is making final adjustments to 2.5 Flash based on developer feedback; it is now available for preview in Google AI Studio, Vertex AI and in the Gemini app. It will be generally available for production in early June.

Google is bringing additional capabilities to both Gemini 2.5 Pro and 2.5 Flash, including native audio output to create more natural conversational experiences, text-to-speech to support multiple speakers, thought summaries and thinking budgets.

With native audio input (in preview), users can steer Gemini’s tone, accent and style of speaking (think: directing the model to be melodramatic or maudlin when telling a story). Like Project Mariner, the model is also equipped with tool use, allowing it to search on users’ behalf.

Other experimental early voice features include affective dialogue, which gives the model the ability to detect emotion in user voice and respond appropriately; proactive audio that allows it to tune out background conversations; and thinking in the Live API to support more complex tasks.

New multiple-speaker features in both Pro and Flash support more than 24 languages, and the models can quickly switch from one dialect to another. “Text-to-speech is expressive and can capture subtle nuances, such as whispers,” Koray Kavukcuoglu, CTO of Google DeepMind, and Tulsee Doshi, senior director for product management at Google DeepMind, wrote in a blog posted today.

Further, 2.5 Pro and Flash now include thought summaries in the Gemini API and Vertex AI. These “take the model’s raw thoughts and organize them into a clear format with headers, key details, and information about model actions, like when they use tools,” Kavukcuoglu and Doshi explain. The goal is to provide a more structured, streamlined format for the model’s thinking process and give users interactions with Gemini that are simpler to understand and debug.

Like 2.5 Flash, Pro is also now equipped with ‘thinking budgets,’ which gives developers the ability to control the number of tokens a model uses to think before it responds, or, if they prefer, turn its thinking capabilities off altogether. This capability will be generally available in coming weeks.

Finally, Google has added native SDK support for Model Context Protocol (MCP) definitions in the Gemini API so that models can more easily integrate with open-source tools.

As Hassabis put it: “We’re living through a remarkable moment in history where AI is making possible an amazing new future. It’s been relentless progress.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

What's Hot

AI Making Call Center Agents Better or Replacing

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers – Takara TLDR

Set up custom domain names for Amazon Bedrock AgentCore Runtime agents

Inside Google’s AI leap: Gemini 2.5 thinks deeper, speaks smarter and codes faster

How Intuit killed the chatbot crutch – and built an agentic AI playbook you can copy

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

Nvidia’s strong Q2 results can’t mask the ASIC challenge in their future

1 Comment

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Australian School Faces Pushback over AI Art Course—and More Art News

London Museum Secures Banksy’s Piranhas