Self-distillation can be adapted for non-autoregressive language models by learning from the model's own future outputs rather than privileged prefixes, achieving better results with 10x fewer training steps than reinforcement learning baselines.
This paper introduces d-OPSD, a self-distillation method designed specifically for diffusion language models (dLLMs) that generate text in arbitrary order rather than left-to-right.