On-policy distillation with specialized teachers can resolve conflicting optimization goals in multi-objective image generation, achieving 10-point improvements over standard reinforcement learning approaches while maintaining quality across all metrics.
Flow-OPD is a training method that improves text-to-image models by using specialized teacher models and on-policy distillation to align multiple competing objectives (like image quality, text accuracy, and aesthetics).