🤖 Generative AI gives robots new "brain" that understands the world

Google DeepMind has integrated its multimodal language model Gemini 2.0 into robots, giving them an advanced AI brain that can understand and interact with the physical world. This enable robots to perform tasks they were never trained for by understanding everyday instructions in natural language.

Mathias Sundin 23.Mar.2025 2 min read

Share this story!

Google DeepMind has integrated its multimodal language model Gemini 2.0 into robots, giving them an advanced AI brain that can understand and interact with the physical world.
The new generative AI systems enable robots to perform tasks they were never trained for by understanding everyday instructions in natural language.
The implementation of generative language models in robot control has more than doubled the robots' ability to adapt to new situations compared to previous systems.

Generative AI becomes robots' new brain for understanding our physical world

Google DeepMind has integrated generative AI into robots. The company has built its multimodal language model Gemini 2.0 into two new robot systems: Gemini Robotics and Gemini Robotics-ER. This can be likened to robots now getting a new type of brain that can understand the world in a more human-like way.

Gemini 2.0 is a powerful generative AI model that can understand and work with text, images, audio, and video. Previously, this type of AI was primarily used in the digital world, but now Google has expanded the model to also control physical robots.

Five examples of how generative AI is changing robots' capabilities

A robot can now pack a snack into a plastic bag by understanding what the task entails without being specifically trained for it.
When an object slips from the robot's grip, it can use the generative AI model to quickly replan and continue with the task.
Robots can now understand and respond to commands given in everyday, conversational language in different languages thanks to the language model's comprehension ability.
When a robot is shown a coffee mug, the generative AI model can intuitively understand how to grip the mug by the handle in a safe way.
Robots can now fold origami by understanding the complex, multi-step instructions required for this type of precise manipulation.

Generative AI gives robots three fundamental capabilities

For robots to be useful in real situations, Google DeepMind has identified three main properties that the generative AI model now enables: generality, interactivity, and dexterity.

Using the Gemini model's world understanding, robots can now adapt to new situations and solve tasks they have never encountered before. According to the company's technical report, the new generative AI control more than doubles the robots' ability to generalize compared to previous systems.

The generative AI model also makes robots intuitively interactive - they can understand instructions given in natural language, continuously monitor their surroundings, detect changes, and adapt their actions accordingly.

Generative AI adapts to different robot platforms

One of the major advantages of using generative AI for robot control is that the same model can be adapted to different types of robots. The model was primarily trained on data from the two-armed robot platform ALOHA 2, but has proven capable of controlling other platforms used in academic environments.

The generative AI model can even be specialized for more complex robots, such as the humanoid robot Apollo developed by Apptronik, with the goal of performing real-world tasks.

Improved world understanding with generative AI

Gemini Robotics-ER enhances the generative AI model's spatial understanding of the physical world. By combining this spatial reasoning with the model's coding abilities, new functions can be created on the fly.

In a complete environment where the generative AI model handles all steps from perception to code generation, the system achieves a 2-3 times higher success rate compared to the basic Gemini 2.0 model.

WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with WALL-Y GPT about this news article and fact-based optimism (requires the paid version of ChatGPT.)