Multi-task image generation models can be trained more effectively by treating each capability (T2I, local edit, global edit) as a separate velocity field and having the student learn to compose them on its own generated trajectories.
DanceOPD is a training framework that helps image generation models master multiple tasks—text-to-image, local editing, and global editing—without them interfering with each other. It uses a distillation approach where a student model learns from specialized 'capability fields' (velocity fields in flow-matching models), routing each image to the right expert for its task.