Learning from the Self-future: On-policy Self-distillation for dLLMs

Yifu Luo, Zeyu Chen, Haoyu Wang, Xinhao Hu, Yuxuan Zhang et al.|June 16, 2026arXiv

Key Takeaway

Self-distillation can be adapted for non-autoregressive language models by learning from the model's own future outputs rather than privileged prefixes, achieving better results with 10x fewer training steps than reinforcement learning baselines.

Summary

This paper introduces d-OPSD, a self-distillation method designed specifically for diffusion language models (dLLMs) that generate text in arbitrary order rather than left-to-right.

training efficiency

Key Terms

self-distillation diffusion-language-model on-policy-learning step-level-supervision