AI 비교하기AI 교차검증AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyFAQContact

Qwen-AgentWorld Language Models for Agent Simulation

Qwen-AgentWorld Language Models for Agent Simulation

HuggingFace
Thursday, June 25, 2026
  • •Researchers launched Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B for agentic environment simulation.
  • •The models use 10M interaction trajectories and a three-stage training pipeline involving CPT, SFT, and RL.
  • •The new AgentWorldBench benchmark shows Qwen-AgentWorld outperforms existing frontier models across 9 established tasks.
  • •Researchers launched Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B for agentic environment simulation.
  • •The models use 10M interaction trajectories and a three-stage training pipeline involving CPT, SFT, and RL.
  • •The new AgentWorldBench benchmark shows Qwen-AgentWorld outperforms existing frontier models across 9 established tasks.

Researchers introduced Qwen-AgentWorld, a framework of language-based world models designed to enhance agentic environment simulation and downstream reasoning capabilities. The team unveiled two primary versions: Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B. These models simulate agentic environments across 7 distinct domains using long chain-of-thought (reasoning steps generated before answering) techniques. The development pipeline utilizes over 10M environment interaction trajectories through a three-stage process: CPT (continual pre-training) for general capabilities, SFT (supervised fine-tuning) for next-state prediction, and RL (reinforcement learning) to refine simulation fidelity via hybrid rubric-and-rule rewards.

To measure progress, the team released AgentWorldBench, a benchmark derived from interactions of 5 frontier models across 9 established tasks. Qwen-AgentWorld significantly outperformed existing models on this benchmark. Beyond general simulation, the model serves as a decoupled simulator for scalable agentic reinforcement learning and provides an effective warm-up phase that improves downstream performance across 7 agentic benchmarks.

Researchers introduced Qwen-AgentWorld, a framework of language-based world models designed to enhance agentic environment simulation and downstream reasoning capabilities. The team unveiled two primary versions: Qwen-AgentWorld-35B-A3B and Qwen-AgentWorld-397B-A17B. These models simulate agentic environments across 7 distinct domains using long chain-of-thought (reasoning steps generated before answering) techniques. The development pipeline utilizes over 10M environment interaction trajectories through a three-stage process: CPT (continual pre-training) for general capabilities, SFT (supervised fine-tuning) for next-state prediction, and RL (reinforcement learning) to refine simulation fidelity via hybrid rubric-and-rule rewards.

To measure progress, the team released AgentWorldBench, a benchmark derived from interactions of 5 frontier models across 9 established tasks. Qwen-AgentWorld significantly outperformed existing models on this benchmark. Beyond general simulation, the model serves as a decoupled simulator for scalable agentic reinforcement learning and provides an effective warm-up phase that improves downstream performance across 7 agentic benchmarks.

Read original (English)·Jun 25, 2026
#qwen#agentic ai#world model#reinforcement learning#chain of thought