Flow-OPD: On-Policy Distillation for Flow Matching Models

Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen et al.|May 8, 2026arXiv

Key Takeaway

On-policy distillation with specialized teachers can resolve conflicting optimization goals in multi-objective image generation, achieving 10-point improvements over standard reinforcement learning approaches while maintaining quality across all metrics.

Summary

Flow-OPD is a training method that improves text-to-image models by using specialized teacher models and on-policy distillation to align multiple competing objectives (like image quality, text accuracy, and aesthetics).

training alignment efficiency

Key Terms

on-policy-distillation flow-matching reward-hacking task-routing grpo