What are the key points?

Flow-OPD framework addresses reward sparsity and gradient interference in Flow Matching image models Two-stage alignment combines GRPO teacher specialization with Manifold Anchor Regularization Stable Diffusion 3.5 Medium benchmarks show GenEval score increase from 63 to 92

Flow-OPD Introduces Two-Stage Alignment for Flow Matching

•Flow-OPD framework addresses reward sparsity and gradient interference in Flow Matching image models
•Two-stage alignment combines GRPO teacher specialization with Manifold Anchor Regularization
•Stable Diffusion 3.5 Medium benchmarks show GenEval score increase from 63 to 92

Flow-OPD is a new post-training framework for Flow Matching text-to-image models designed to resolve reward sparsity and gradient interference. Published May 8, 2026, the system improves multi-task alignment through a two-stage strategy.

The approach first cultivates specialized teacher models using GRPO (Group Relative Policy Optimization). It then consolidates these into a single student policy through on-policy sampling and dense trajectory-level supervision. To prevent aesthetic degradation, the researchers introduced Manifold Anchor Regularization, which anchors generation to a high-quality data manifold.

Evaluations on Stable Diffusion 3.5 Medium show the framework raised GenEval scores from 63 to 92 and improved OCR accuracy from 59 to 94. These results demonstrate an improvement of roughly 10 points over vanilla GRPO while preserving image fidelity.

Flow-OPD is a new post-training framework for Flow Matching text-to-image models designed to resolve reward sparsity and gradient interference. Published May 8, 2026, the system improves multi-task alignment through a two-stage strategy.

The approach first cultivates specialized teacher models using GRPO (Group Relative Policy Optimization). It then consolidates these into a single student policy through on-policy sampling and dense trajectory-level supervision. To prevent aesthetic degradation, the researchers introduced Manifold Anchor Regularization, which anchors generation to a high-quality data manifold.

Evaluations on Stable Diffusion 3.5 Medium show the framework raised GenEval scores from 63 to 92 and improved OCR accuracy from 59 to 94. These results demonstrate an improvement of roughly 10 points over vanilla GRPO while preserving image fidelity.