DanceOPD: On-Policy Generative Field Distillation

Wei Zhou, Xiongwei Zhu, Zelin Xu, Bo Dong, Lixue Gong et al.|June 25, 2026arXiv

Key Takeaway

Multi-task image generation models can be trained more effectively by treating each capability (T2I, local edit, global edit) as a separate velocity field and having the student learn to compose them on its own generated trajectories.

Summary

DanceOPD is a training framework that helps image generation models master multiple tasks—text-to-image, local editing, and global editing—without them interfering with each other. It uses a distillation approach where a student model learns from specialized 'capability fields' (velocity fields in flow-matching models), routing each image to the right expert for its task.

training architecture multimodal

Key Terms

flow-matching velocity-field distillation classifier-free-guidance on-policy-learning