AI 비교하기AI 교차검증AI 최신정보AI 커뮤니티
NoticeAIB beta opens in late June! 🚀Compare AI, share your experiences, and learn together with the community. See you soon! ☺️
Our VisionTermsPrivacyFAQContact

Stabilizing 1000-Layer Diffusion Transformers

Stabilizing 1000-Layer Diffusion Transformers

HuggingFace
Tuesday, May 12, 2026
  • •Researchers identify 'Mean Mode Screaming' as a collapse trigger in deep models
  • •New MV-Split Residuals method enables stable training of 1000-layer architectures
  • •MV-Split avoids dampening signal-bearing modes, allowing faster convergence vs. LayerScale
  • •Researchers identify 'Mean Mode Screaming' as a collapse trigger in deep models
  • •New MV-Split Residuals method enables stable training of 1000-layer architectures
  • •MV-Split avoids dampening signal-bearing modes, allowing faster convergence vs. LayerScale

Research published on May 7, 2026, introduces a method called Mean-Variance Split (MV-Split) Residuals to stabilize Diffusion Transformers (DiT) at extreme depths. The technique addresses a structural instability identified as "Mean Mode Screaming" (MMS), a phenomenon where deep models enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation.

MMS occurs when a mean-coherent backward shock on residual writers drives the network into a collapsed state, effectively suppressing necessary data signals. While existing depth stabilizers like LayerScale mitigate collapse, they dampen both mean and signal-bearing modes, which the researchers found slows model convergence. The MV-Split approach instead regulates the mean path separately from the signal-bearing centered path, maintaining stability without sacrificing data fidelity.

The method successfully prevented divergent collapse in a 400-layer single-stream DiT, allowing it to track close to baseline trajectories while outperforming LayerScale across the schedule. The team validated the architecture by successfully training a 1000-layer DiT, demonstrating that the model remains stably trainable at extreme depths.

Research published on May 7, 2026, introduces a method called Mean-Variance Split (MV-Split) Residuals to stabilize Diffusion Transformers (DiT) at extreme depths. The technique addresses a structural instability identified as "Mean Mode Screaming" (MMS), a phenomenon where deep models enter a silent, mean-dominated collapse state that homogenizes token representations and suppresses centered variation.

MMS occurs when a mean-coherent backward shock on residual writers drives the network into a collapsed state, effectively suppressing necessary data signals. While existing depth stabilizers like LayerScale mitigate collapse, they dampen both mean and signal-bearing modes, which the researchers found slows model convergence. The MV-Split approach instead regulates the mean path separately from the signal-bearing centered path, maintaining stability without sacrificing data fidelity.

The method successfully prevented divergent collapse in a 400-layer single-stream DiT, allowing it to track close to baseline trajectories while outperforming LayerScale across the schedule. The team validated the architecture by successfully training a 1000-layer DiT, demonstrating that the model remains stably trainable at extreme depths.

Read original (English)·May 12, 2026
#diffusion transformers#model collapse#training stability#residual networks#deep learning architecture