DeepMind’s Gemini Robotics Empowers Robots with Advanced Reasoning能力

Key Takeaways

Google DeepMind has launched Gemini Robotics 1.5, enhancing robots’ ability to perceive and plan complex tasks.
The new update features two systems: Gemini Robotics 1.5 for motor command execution and Gemini Robotics-ER 1.5 for “embodied reasoning” using digital tools.
Robots can now share learned skills across different models, facilitating faster and smarter adaptation to new tasks.

Improved Autonomy and Intelligence

Google DeepMind has unveiled Gemini Robotics 1.5, the latest in its series of vision-language-action models aimed at enhancing robotic capabilities. The upgrade is designed to improve robots’ perception and cognitive abilities, allowing them to plan and execute complex tasks with increased autonomy.

The release features two integrated systems: Gemini Robotics 1.5, which translates visual data and verbal instructions into motor commands, and Gemini Robotics-ER 1.5, an “embodied reasoning” model. This latter model utilizes digital tools, such as web searches, to strategize task execution before delegating actions to its counterpart. The synergy of these models enables robots to “think” before they act, explaining their decision-making and adapting to various context-dependent scenarios—like sorting laundry by color or packing luggage based on weather conditions.

In a blog post dated September 25, DeepMind highlighted that the release of Gemini 1.5 represents a “foundational step” towards achieving artificial general intelligence (AGI). According to the company, “Gemini Robotics 1.5 marks an important milestone toward solving AGI in the physical world.” The introduction of agentic capabilities moves beyond reactive command models, aiming to build systems that can reason, plan, utilize tools, and generalize across different contexts.

Another significant advancement in this iteration is the robots’ ability to share skills across diverse systems and robotic forms. Test results revealed that tasks learned by the dual-arm ALOHA2 robot could be directly applied to the Franka bi-arm robot and Apptronik’s humanoid Apollo robot without requiring retraining. ALOHA2 is a collaborative project between DeepMind and Stanford University, while the Franka robot is developed by Germany-based Agile Robots AG, and Austin’s Apptronik is vying to compete with Tesla’s Optimus humanoid robot initiative.

DeepMind remarked, “This breakthrough accelerates learning new behaviors, helping robots become smarter and more useful.” By facilitating skill transfer among different robotic platforms, the Gemini upgrade significantly reduces learning time, enhancing the overall utility of robots in various applications.

Google plans to make Gemini Robotics-ER 1.5 available to developers through the Gemini API in Google AI Studio. However, access to Gemini Robotics 1.5 will initially be limited to select partners, reflecting an incremental approach to rolling out this advanced technology.

Overall, Gemini Robotics 1.5 promises to redefine the landscape of robotics by enhancing intelligence, adaptability, and efficiency in task execution, marking a significant advancement towards the goal of AGI.

The content above is a summary. For more details, see the source article.