Google DeepMind has launched Genie 3 world model that creates an interactive 3D digital environment. This new world model is a stepping stone in the evolution of artificial general intelligence.
Genie 3 is a predecessor of Genie 2 and incorporates elements from models like Veo 3. The main purpose of building this new tool is to train AI agents in a realistic simulated environment.
Shlomi Fruchter, research director at DeepMind, said in a press briefing, “Genie 3 is the first real-time interactive general-purpose world model.”
“It goes beyond the narrow world models that existed before. It’s not specific to any particular environment. It can generate both photo-realistic and imaginary worlds, and everything in between,” he added.
Genie 3, unlike previous video generator models, can respond instantly to commands to create and navigate through the generated world in real-time.
The new world model has physical consistency and longevity. The prompted objects and details remain in place even if users move somewhere else and then return. There is a “visual memory” that remembers things up to 1 minute.
The Genie 3 world model has enhanced resolution of 720p, more than the previous Genie 2’s 360p.
The most interesting feature is “promptable world events”, which allow users to change the state of the world with text prompts in real-time. Now, you can add things like rain, a new vehicle, or a bird while the simulation is running.
According to DeepMind, Genie 3 would train AI agents. They would be able to deal with the unexpected situations by creating “what if” scenarios. For example, a self-driving car can be trained about a sudden obstacle that has not been encountered yet. Despite all these improvements from the previous models, Genie 3 also has some restrictions:
The range of actions performed by AI agents is still limited.It is said that the Genie 3 has a deep understanding of physics, but it still does not use hard-coded physics and learn from the trained videos.The model cannot create a real-world. It can generate realistic environments.The “visual memory” is not capable of running for hours. It has a timespan of 1 minute. Genie 3 is in a “limited research preview”, not publicly available yet.
Who is the Google DeepMind CEO?
Sir Demis Hassabis is the CEO of Google DeepMind.