AI 비교하기AI 교차검증AI 최신정보AI 커뮤니티
NoticeAIB beta opens in late June! 🚀Compare AI, share your experiences, and learn together with the community. See you soon! ☺️
Our VisionTermsPrivacyFAQContact

Flow-OPD Introduces Two-Stage Alignment for Flow Matching

Flow-OPD Introduces Two-Stage Alignment for Flow Matching

HuggingFace
Tuesday, May 12, 2026
  • •Flow-OPD framework addresses reward sparsity and gradient interference in Flow Matching image models
  • •Two-stage alignment combines GRPO teacher specialization with Manifold Anchor Regularization
  • •Stable Diffusion 3.5 Medium benchmarks show GenEval score increase from 63 to 92
  • •Flow-OPD framework addresses reward sparsity and gradient interference in Flow Matching image models
  • •Two-stage alignment combines GRPO teacher specialization with Manifold Anchor Regularization
  • •Stable Diffusion 3.5 Medium benchmarks show GenEval score increase from 63 to 92

Flow-OPD is a new post-training framework for Flow Matching text-to-image models designed to resolve reward sparsity and gradient interference. Published May 8, 2026, the system improves multi-task alignment through a two-stage strategy.

The approach first cultivates specialized teacher models using GRPO (Group Relative Policy Optimization). It then consolidates these into a single student policy through on-policy sampling and dense trajectory-level supervision. To prevent aesthetic degradation, the researchers introduced Manifold Anchor Regularization, which anchors generation to a high-quality data manifold.

Evaluations on Stable Diffusion 3.5 Medium show the framework raised GenEval scores from 63 to 92 and improved OCR accuracy from 59 to 94. These results demonstrate an improvement of roughly 10 points over vanilla GRPO while preserving image fidelity.

Flow-OPD is a new post-training framework for Flow Matching text-to-image models designed to resolve reward sparsity and gradient interference. Published May 8, 2026, the system improves multi-task alignment through a two-stage strategy.

The approach first cultivates specialized teacher models using GRPO (Group Relative Policy Optimization). It then consolidates these into a single student policy through on-policy sampling and dense trajectory-level supervision. To prevent aesthetic degradation, the researchers introduced Manifold Anchor Regularization, which anchors generation to a high-quality data manifold.

Evaluations on Stable Diffusion 3.5 Medium show the framework raised GenEval scores from 63 to 92 and improved OCR accuracy from 59 to 94. These results demonstrate an improvement of roughly 10 points over vanilla GRPO while preserving image fidelity.

Read original (English)·May 12, 2026
#flow matching#on policy distillation#text to image#grpo#stable diffusion 3 5