What are the key points?

MACE-Dance generates music-driven dance videos using a cascaded Mixture-of-Experts architecture. The framework separates tasks into a Motion Expert for 3D generation and an Appearance Expert for video synthesis. MACE-Dance achieves state-of-the-art performance in 3D dance generation and pose-driven image animation.

MACE-Dance Framework Enables Music-Driven Dance Video Generation

•MACE-Dance generates music-driven dance videos using a cascaded Mixture-of-Experts architecture.
•The framework separates tasks into a Motion Expert for 3D generation and an Appearance Expert for video synthesis.
•MACE-Dance achieves state-of-the-art performance in 3D dance generation and pose-driven image animation.

Researchers introduced MACE-Dance, a framework for generating dance videos from music, on May 11, 2026. The system utilizes a cascaded Mixture-of-Experts (MoE) architecture to decouple video synthesis into motion generation and appearance preservation, addressing current limitations in joint visual quality and realistic human movement.

The framework divides processing between two specialized components. The Motion Expert handles music-to-3D motion using a BiMamba-Transformer hybrid model combined with a Guidance-Free Training (GFT) strategy to ensure kinematic plausibility. The Appearance Expert manages video synthesis, maintaining identity and spatiotemporal coherence.

The system demonstrates state-of-the-art (SOTA) performance in 3D dance generation and pose-driven image animation. To validate these results, the authors curated a new large-scale dataset and established a specific motion-appearance evaluation protocol.

Researchers introduced MACE-Dance, a framework for generating dance videos from music, on May 11, 2026. The system utilizes a cascaded Mixture-of-Experts (MoE) architecture to decouple video synthesis into motion generation and appearance preservation, addressing current limitations in joint visual quality and realistic human movement.

The framework divides processing between two specialized components. The Motion Expert handles music-to-3D motion using a BiMamba-Transformer hybrid model combined with a Guidance-Free Training (GFT) strategy to ensure kinematic plausibility. The Appearance Expert manages video synthesis, maintaining identity and spatiotemporal coherence.

The system demonstrates state-of-the-art (SOTA) performance in 3D dance generation and pose-driven image animation. To validate these results, the authors curated a new large-scale dataset and established a specific motion-appearance evaluation protocol.