What are the key points?

MolmoAct2 introduces an open-weight action reasoning model for universal robotic control. Features MolmoER backbone, trained on 3.3 million samples for spatial and embodied reasoning. Surpasses Gemini Robotics ER-1.5 and GPT-5 across 13 embodied-reasoning benchmarks.

MolmoAct2 Brings Open-Source Reasoning to Robotics

•MolmoAct2 introduces an open-weight action reasoning model for universal robotic control.
•Features MolmoER backbone, trained on 3.3 million samples for spatial and embodied reasoning.
•Surpasses Gemini Robotics ER-1.5 and GPT-5 across 13 embodied-reasoning benchmarks.

The landscape of robotics is undergoing a quiet but profound transformation. For years, the dream of a generalist robot—a single machine capable of performing diverse, complex tasks in unpredictable environments—has been tethered to either opaque, proprietary software or highly specialized, expensive hardware configurations. This gap between the theoretical capabilities of modern AI and the practical requirements of real-world deployment has long hindered progress. Enter MolmoAct2, a new open-source system that aims to bridge this divide by democratizing access to high-level action reasoning for robots.

At its core, MolmoAct2 is designed to address the "three hurdles" of robotics: reasoning, latency, and accessibility. Previous models often faltered because they treated the robot's physical movements as a secondary thought, tacked onto an existing language model. MolmoAct2 changes this by using a specialized backbone called MolmoER, specifically tuned for spatial and embodied intelligence. By feeding the model a massive corpus of 3.3 million samples, the researchers have taught it not just to "see" a scene, but to understand the physical relationship between objects, spaces, and the robot's own limbs.

One of the most innovative aspects of this release is how it manages the "reasoning tax" that typically slows down robots. Historically, complex reasoning models are sluggish, making them unsuitable for dynamic tasks where milliseconds count. The team introduced "MolmoThink," an adaptive-depth reasoning engine. Instead of re-evaluating the entire scene every time a frame ticks by, it intelligently focuses its processing power only on the regions of the environment that have changed. This selective attention allows the system to maintain geometric grounding without the heavy latency that has plagued its predecessors.

Furthermore, the project addresses the scarcity of high-quality training data, a massive bottleneck in robotics. The team is releasing MolmoAct2-BimanualYAM, the largest open-source dataset of teleoperated bimanual (two-handed) trajectories currently available. By combining this with an open-weight action tokenizer dubbed OpenFAST, the researchers are effectively handing the robotics community a comprehensive toolkit. The goal is clear: lower the barrier to entry for developers and researchers to build intelligent, adaptable agents without needing to rebuild foundational models from scratch.

The implications of this work extend beyond just improved benchmarks. While the system demonstrates performance that competes with, and in several domains exceeds, closed-source models from giants like Google and OpenAI, its true value lies in its open nature. By releasing the model weights, training code, and data, the project invites a global community to iterate on the software, potentially accelerating the transition of robotics from isolated lab demonstrations into versatile, reliable tools for everyday use.

The landscape of robotics is undergoing a quiet but profound transformation. For years, the dream of a generalist robot—a single machine capable of performing diverse, complex tasks in unpredictable environments—has been tethered to either opaque, proprietary software or highly specialized, expensive hardware configurations. This gap between the theoretical capabilities of modern AI and the practical requirements of real-world deployment has long hindered progress. Enter MolmoAct2, a new open-source system that aims to bridge this divide by democratizing access to high-level action reasoning for robots.

At its core, MolmoAct2 is designed to address the "three hurdles" of robotics: reasoning, latency, and accessibility. Previous models often faltered because they treated the robot's physical movements as a secondary thought, tacked onto an existing language model. MolmoAct2 changes this by using a specialized backbone called MolmoER, specifically tuned for spatial and embodied intelligence. By feeding the model a massive corpus of 3.3 million samples, the researchers have taught it not just to "see" a scene, but to understand the physical relationship between objects, spaces, and the robot's own limbs.

One of the most innovative aspects of this release is how it manages the "reasoning tax" that typically slows down robots. Historically, complex reasoning models are sluggish, making them unsuitable for dynamic tasks where milliseconds count. The team introduced "MolmoThink," an adaptive-depth reasoning engine. Instead of re-evaluating the entire scene every time a frame ticks by, it intelligently focuses its processing power only on the regions of the environment that have changed. This selective attention allows the system to maintain geometric grounding without the heavy latency that has plagued its predecessors.

Furthermore, the project addresses the scarcity of high-quality training data, a massive bottleneck in robotics. The team is releasing MolmoAct2-BimanualYAM, the largest open-source dataset of teleoperated bimanual (two-handed) trajectories currently available. By combining this with an open-weight action tokenizer dubbed OpenFAST, the researchers are effectively handing the robotics community a comprehensive toolkit. The goal is clear: lower the barrier to entry for developers and researchers to build intelligent, adaptable agents without needing to rebuild foundational models from scratch.

The implications of this work extend beyond just improved benchmarks. While the system demonstrates performance that competes with, and in several domains exceeds, closed-source models from giants like Google and OpenAI, its true value lies in its open nature. By releasing the model weights, training code, and data, the project invites a global community to iterate on the software, potentially accelerating the transition of robotics from isolated lab demonstrations into versatile, reliable tools for everyday use.