Press "Enter" to skip to content

Google Pairs New AI Models With Robotics

Through a series of videos, Google DeepMind demonstrates its progress in making robots more innovative and agile by pairing them with Gemini AI models.

What do you get when you couple the typical large language models of AI with robots? Exactly, more intelligent robots that you can control with a prompt or with a voice. So much for the simple theory: it is a bit more complicated in practice. Google has been working on this technology for a while with DeepMind.

The goal is to integrate Gemini’s full ability—Google’s chatbot and Gen AI tool—to reason, understand, and act in the physical world via robotics.

In a blog update, Google shows how far it has come. Built on Gemini 2.0, Gemini Robotics and Gemini Robotics-ER now form the basis for a new generation of helpful robots. Thanks to the integration with Gemini, these can be made more versatile or interactive. In the demo below, for example, a DeepMind employee shows how a robot arm is controlled by voice to complete a specific task independently.

In this example, recognise the banana in a fruit bowl, grab it, and move it to the right bowl. In the same demo, the employee also makes it a bit more challenging by moving the bowls. This is just one of the demos that Google shares in the blog update.

Gemini Robotics and Robotics-ER are Google’s most advanced vision-language(-action) models. They build on Gemini’s multimodal understanding of the world and add physical actions as a new output modality. The model provides advanced spatial awareness. Robotics experts can build on this and execute their programs using Gemini’s reasoning capabilities. Google is working on this with Apptronik, a specialist in humanoid robots.

Be First to Comment

Leave a Reply