Stay informed with free updates
Simply sign up to the Artificial intelligence myFT Digest — delivered directly to your inbox.
Google DeepMind has unveiled artificial intelligence models that further advance reasoning capabilities in robotics, enabling them to solve harder problems and complete more complicated real world tasks like sorting laundry and recycling rubbish.
The company’s new robotics models, called Gemini Robotics 1.5 and Gemini Robotics-ER 1.5, are designed to help robots complete multi-step tasks by “thinking” before they act, as part of the tech industry’s push to make the general-purpose machines more useful in the everyday world.
According to Google DeepMind, a robot trained using its new model was able to plan how to complete tasks that might take several minutes, such as folding laundry into different baskets based on colour.
The development comes as tech groups, including OpenAI and Tesla, are racing to integrate AI models into robots in the hope that they could transform a range of industries, from healthcare to manufacturing.
“Models up to now were able to do really well at doing one instruction at a time,” said Carolina Parada, senior director and head of robotics at Google DeepMind. “We’re now moving from one instruction to actually genuine understanding and problem solving for physical tasks.”
In March, Google DeepMind unveiled the first iteration of these models, which took advantage of the company’s Gemini 2.0 system to help robots adjust to different new situations, respond quickly to verbal instructions or changes in their environment, and be dexterous enough to manipulate objects.
While that version was able to reason how to complete tasks, such as folding paper or unzipping a bag, the latest model can follow a series of instructions and also use tools such as Google search to help it solve problems.
In one demonstration, a Google DeepMind researcher asked the robot to pack a beanie into her bag for a trip to London. The robot was also able to tell the researcher that it was going to rain for several days during the trip, and so the robot also packed an umbrella into the bag.
The robot was also able to sort rubbish into appropriate recycling bins, by first using online tools to figure out it was based in San Francisco, and then searching the web for the city’s recycling guidelines.
The Gemini Robotics 1.5 is a vision-language-action model, which combines several different inputs and then translates them into action. These systems are able to learn about the world through data downloaded from the internet.
Ingmar Posner, professor of applied artificial intelligence at the University of Oxford, said learning from this kind of internet scale data could help robotics reach a “ChatGPT moment”.
But Angelo Cangelosi, co-director of the Manchester Centre for Robotics and AI, cautioned against calling what these robots are doing as real thinking. “It’s just discovering regularities between pixels, between images, between words, tokens, and so on,” he said.
Another development with Google DeepMind’s new system is a technique called “motion transfer”, which allows one AI model to use skills that were designed for a specific type of robot body, such as robotic arms, and transfer it to another, such as a humanoid robot.
Recommended
Traditionally, to get robots to move around in a space and take action requires plenty of meticulous planning and coding, and this training was often specific to a particular type of robot, such as robotic arms. This “motion transfer” breakthrough could help solve a major bottleneck in AI robotics development, which is the lack of enough training data.
“Unlike large language models that can be trained on the entire vast internet of data, robotics has been limited by the painstaking process of collecting real [data for robots],” said Kanishka Rao, principal software engineer of robotics at Google DeepMind.
The company said it still needed to overcome a number of hurdles in the technology. This included creating the ability for robots to learn skills by watching videos of humans doing tasks.
It also said robots needed to become more dexterous as well as reliable and safe before they could be rolled out into environments where they interact with humans.
“One of the major challenges of building general robots is that things that are intuitive for humans are actually quite difficult for robots,” said Rao.