What are the key points?

MobileForge introduces annotation-free learning for mobile GUI agents using hierarchical feedback-guided policy optimization. The ForgeOwl-8B model achieves 77.6% Pass@3 on AndroidWorld and 41.0% on out-of-domain MobileWorld benchmarks. System uses MobileGym for real-app interaction grounding and GRPO updates for improved step-level process feedback.

MobileForge Enhances Mobile GUI Agents Without Human Annotations

•MobileForge introduces annotation-free learning for mobile GUI agents using hierarchical feedback-guided policy optimization.
•The ForgeOwl-8B model achieves 77.6% Pass@3 on AndroidWorld and 41.0% on out-of-domain MobileWorld benchmarks.
•System uses MobileGym for real-app interaction grounding and GRPO updates for improved step-level process feedback.

MobileForge is a newly developed annotation-free adaptation system designed to improve mobile GUI agents by grounding task generation and evaluation directly in real app interactions. Traditional agent models often struggle with high costs associated with human-written tasks, demonstrations, and reward labels, as well as the rapid update cycles of mobile applications. To address these gaps, MobileForge introduces MobileGym, a framework for interaction grounding, alongside Hierarchical Feedback-Guided Policy Optimization (HiFPO). HiFPO converts trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized GRPO updates (a reinforcement learning technique optimizing model responses based on process feedback).

Using exclusively automatically generated data, MobileForge adapted the Qwen3-VL-8B model to reach a 67.2% Pass@3 rate on the AndroidWorld benchmark. This performance nears that of the closed-data GUI-specialized model GUI-Owl-1.5-8B, which sits at 69.0%. Furthermore, the adapted ForgeOwl-8B model achieved a 77.6% Pass@3 score on AndroidWorld and a 41.0% success rate on the out-of-domain MobileWorld GUI-only split. According to the research, this establishes ForgeOwl-8B as the strongest open-data mobile GUI agent currently available. The researchers have committed to releasing the code, data, and trained models via their project website.

MobileForge is a newly developed annotation-free adaptation system designed to improve mobile GUI agents by grounding task generation and evaluation directly in real app interactions. Traditional agent models often struggle with high costs associated with human-written tasks, demonstrations, and reward labels, as well as the rapid update cycles of mobile applications. To address these gaps, MobileForge introduces MobileGym, a framework for interaction grounding, alongside Hierarchical Feedback-Guided Policy Optimization (HiFPO). HiFPO converts trajectory outcomes, step-level process feedback, and corrective hints into hint-contextualized GRPO updates (a reinforcement learning technique optimizing model responses based on process feedback).

Using exclusively automatically generated data, MobileForge adapted the Qwen3-VL-8B model to reach a 67.2% Pass@3 rate on the AndroidWorld benchmark. This performance nears that of the closed-data GUI-specialized model GUI-Owl-1.5-8B, which sits at 69.0%. Furthermore, the adapted ForgeOwl-8B model achieved a 77.6% Pass@3 score on AndroidWorld and a 41.0% success rate on the out-of-domain MobileWorld GUI-only split. According to the research, this establishes ForgeOwl-8B as the strongest open-data mobile GUI agent currently available. The researchers have committed to releasing the code, data, and trained models via their project website.