AI Robotics¶

The State of AI for Robotics in 2025

The field of embodied AI, the intersection of Artificial Intelligence and Robotics, is experiencing rapid advancements, largely driven by breakthroughs in AI models. Google's release of Gemini for robotics, a new family of AI models specifically designed for humanoid robots, marks a significant milestone. This development, alongside efforts from companies like Figure AI, Unitree, Nvidia, and various startups, indicates a shift from highly specialised robotic tasks to more generalised, adaptable, and interactive capabilities. Key themes include the vertical integration of hardware and AI, the emergence of dual-model systems for reasoning and execution, the importance of generalisation and dexterity, and the increasing commercial viability of humanoid robots, moving beyond speculative phases.

Key Themes and Important Ideas¶

AI Robotics

A. The Challenge of Embodied AI and the Need for Generalisation¶

Historically, humanoid robots have required "specific training for each action," with AI models primarily assisting with "edge cases and little deviations." This limitation meant robots could perform "easily mix a drink during the demo likely because they were trained to do that however they would have had difficulty if a patron asked to shake their hand without a human controlling them." The central problem that new AI models aim to solve is enabling robots to perform "generalized tasks" and "adapt to different situations" without explicit pre-training for every scenario.

B. Google DeepMind's Gemini Robotics: A Breakthrough in Generalised Embodied AI¶

Google DeepMind's Gemini Robotics is a pivotal development in this space. Built on Gemini 2.0, it inherits "native multimodal functionality," allowing it to process "visual text and audio inputs." DeepMind outlines three principal qualities for useful robotic AI models:

General: "meaning they're there to adapt to different situations." Interactive: "meaning they can understand and respond quickly to instructions or changes in their environment." Dextrous: "meaning they can do the kind of things people generally do with their hands and fingers like carefully manipulate objects." Gemini Robotics achieves this through a dual-model system:

Advanced Vision Language Action (VLA) model: Similar to other multimodal LLMs but includes "physical actions as a new mode of output." This allows for "instruction following results" and enables robots to "generalize to novel situations and solve a wide variety of tasks out of the box including tasks it has never seen before in training." Gemini Robotics ER (Embodied Reasoning): This model "takes the premise behind reasoning models and applies it to physical environments," offering "Advanced spatial understanding" and the ability to "instantiate entirely new capabilities on the Fly." For example, it can determine "an appropriate two-finger grasp for picking it up by the handle and a safe trajectory for approaching it." This system allows robots to move "from a narrow range of specific tasks to much more generalized applications." As Kirana Gopala Krishnan of DeepMind stated, "it's the first time where I've personally felt that building generic embodied intelligence is Within Reach like a robot coming to life."

C. Vertical Integration: Hardware and AI Must Converge¶

Figure AI's decision to "ditch their partnership with OpenAI to use their own models developed in house" highlights a growing sentiment that for embodied AI to "solve embodied AI at scale in the real world you have to vertically integrate robot AI." Figure AI CEO Brett Adcock explicitly stated, "We can't Outsource AI for the same reason we can't Outsource our hardware." This approach aims to ensure seamless integration and optimal performance.

D. The Rise of Dual-Model Systems for Reasoning and Execution¶

Both Google and Figure AI are converging on a similar "basic system design of pairing a reasoning model with an execution model." This mirrors the design of current AI agents, where a "reasoning model for planning and analysis of the situation" hands off to a "separate model for execution." This suggests "embodied AI as agents with eyes and hands."

E. Commercialisation and Investment Boom¶

The sector is moving beyond a purely speculative phase, with significant commercial deployments and valuations:

Figure AI: Valued at $2.6 billion during Series B, now reportedly in talks to raise Series C at "$39.5 billion." They have pilot programmes at the "BMW manufacturing plant in South Carolina" and an "undisclosed contract that the company says could potentially allow them to reach 100,000 robots shipped."
Unitree: Offering G1 units starting at "$16,000," indicating a push towards more accessible hardware, with expected price reductions.
Dexterity Inc.: Raised "$95 million at a $1.65 billion valuation" for robots capable of "humanlike dexterity," aligning with Google's criteria for generalised robotics: "touch and recognize objects are aware of and respond appropriately to surroundings and will move gracefully and adjust as needed."
Apptronik: Raised "$350 million in series A funding" at an undisclosed valuation, having worked on humanoid robots for over a decade. They are partnering with Google DeepMind for AI to drive their robots, with CEO Jeff Cardus stating that "what 2025 is about for optronic and the humanoid industry is really demonstrating useful work in these applications with these initial early adopters and customers and then true commercialization and scaling happening in 2026 and Beyond."

F. Nvidia's Role in Training and Simulation¶

Nvidia, while not building robots themselves, is a key player in providing the AI infrastructure for training. Their "Cosmos World Foundation model" allows for "virtual simulations of real world scenarios for robot training." This "digital twin" approach enables "synthetic training data to be quickly generated," leading to "big Improvements in dexterity and specific movement training." Nvidia CEO Jensen Huang is bullish on the future, stating that "The Chachi BT moment for General robotics is just around the corner," and predicting that self-driving cars will be the "first multi-trillion dollar robotics industry." He also expects Nvidia's products to power "a billion humanoid robots over the coming years."

G. Global Competition and Developments¶

Beyond the US, China is a significant player, with companies like "X Robot" and "Unitree" showcasing advanced robots. While some demos still rely on human operators, the pace of development is rapid. The potential for Google's models to "fill in the blanks where Chinese embodied AI is lacking Right Now" is noted.

Conclusion¶

The "inflection point" for physical AI, as described by investors, is rapidly approaching. The ability of new AI models, particularly Google's Gemini Robotics, to confer generalisation, interactivity, and dexterity to humanoid robots is a game-changer. The vertical integration strategy adopted by companies like Figure AI, coupled with advancements in simulation and training from Nvidia, is accelerating progress. As commercial deployments begin and valuations soar, the vision of "bringing humanoids into the household setting" and widespread industrial application appears increasingly inevitable, with 2025 and beyond poised for significant commercialisation and scaling. Mark Gman of Bloomberg succinctly summarised the broader implication: "artificial intelligence is going to be at the core of everything and really the ultimate Hardware expression of AI is robotics being able to understand how a human acts artificially learn from data and mimic a human and that's what a robot is."