A2D2: Fine-Tuning Any-Length Discrete Diffusion for Adaptive Decoding

Sophia Tang, Yuchen Zhu, Molei Tao, Pranam Chatterjee|June 11, 2026arXiv

Key Takeaway

You can now fine-tune discrete diffusion models for any-length generation with theoretical guarantees—the method optimizes both token insertion and unmasking policies together, improving reward alignment while maintaining generation flexibility.

Summary

A2D2 enables reward-guided fine-tuning of discrete diffusion models that generate sequences of any length. The method jointly optimizes how tokens are inserted and unmasked during generation, plus the inference schedule, using a theoretically grounded approach that converges to reward-optimized outputs without needing target examples.

training reasoning efficiency

Key Terms

discrete-diffusion-models reward-guided-fine-tuning radon-nikodym-derivative token-insertion inference-schedule