DeepMind and OpenAI models solve maths problems at level of top students

A participant holds up a gold medal won at at the 63rd International Mathematical Olympiad. — Models from OpenAI and DeepMind achieved gold medal scores in the International Mathematical Olympiad.Credit: MoiraM/Alamy

Google DeepMind announced on 21 July that its software had cracked a set of maths problems at the level of the world’s top high-school students, achieving a gold-medal score on questions from the International Mathematical Olympiad. At first sight, this marked only a marginal improvement over the prevous year’s performance. The company’s system had performed in the upper range of silver medal standard at the 2024 Olympiad, while this year it was evaluated in the lower range for a human gold medallist.

DeepMind AI crushes tough maths problems on par with top human solvers

But the grades this year hide a “big paradigm shift,” says Thang Luong, a computer scientist at DeepMind in Mountain View, California. The company achieved its previous feats using two artificial intelligence (AI) tools specifically designed to carry out rigorous logical steps in mathematical proofscalculations, called AlphaGeometry and AlphaProof. The process required human experts to first translate the problems’ statements into something similar to a programming language, and then to translate the AI’s solutions back into English.

“This year, everything is natural language, end to end,” says Luong. The team employed a large language model (LLM) called DeepThink, which is based on its Gemini system but with some additional developments that made it better and faster at producing mathematical arguments, such as handling multiple chains of thought in parallel. “For a long time, I didn’t think we could go that far with LLMs,” Luong adds.

DeepThink scored 35 out of 42 points on the 6 problems that had been given to participants in this year’s Olympiad. Under an agreement with the organizers, the computer’s solutions were marked by the same judges who evaluated the human participants.

Separately, ChatGPT creator OpenAI, based in San Francisco, California, had its own LLM solve the same Mathematical Olympiad problems at gold medal level, but had its solutions evaluated independently.

Impressive performance

For years, many AI researchers have fallen in one of two camps. Until 2012, the leading approach for was to code the rules of logical thinking into the machine by hand. Since then, neural networks — which train automatically by learning from vast troves of data — have made a series of sensational breakthroughs, and tools such as OpenAI’s ChatGPT have now entered mainstream use.

DeepMind AI solves geometry problems at star-student level

Gary Marcus, a neuroscientist at New York University (NYU) in New York City, called the results by DeepMind and OpenAI “Awfully impressive.” Marcus is an advocate of the ‘coding logic by hand’ approach — also known as neurosymbolic AI — and a frequent critic of what he sees as hype surrounding LLMs. Still, writing on Substack with NYU computer scientist Ernest Davis, he commented that “to be able to solve math problems at the level of the top 67 high school students in the world is to have really good math problem solving chops”.

It remains to be seen whether LLM superiority on IMO problems is here to stay, or if neurosymbolic AI will claw its way back to the top. “At this point the two camps still keep developing,” says Luong, who works on both approaches. “They could converge together.”

Source link

What's Hot

Where C3.ai Stands With Analysts – C3.ai (NYSE:AI)

Paper page – LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations

DeepMind and OpenAI models solve maths problems at level of top students

OpenAI’s most capable AI model, GPT-5, may be coming in August

Samsung has its eye on Perplexity and OpenAI as it plans to expand beyond Gemini

Japan’s Legal AI Startup Scores $50 Million Round Led By Goldman Sachs, Partners With OpenAI

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Artist Loses Final Appeal in Case of Apologising for ‘Fishrot Scandal’

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Where C3.ai Stands With Analysts – C3.ai (NYSE:AI)

Paper page – LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

How PerformLine uses prompt engineering on Amazon Bedrock to detect compliance violations

What's Hot

DeepMind and OpenAI models solve maths problems at level of top students

Impressive performance

Related Posts

Subscribe to Updates