You can now fine-tune discrete diffusion models for any-length generation with theoretical guarantees—the method optimizes both token insertion and unmasking policies together, improving reward alignment while maintaining generation flexibility.
A2D2 enables reward-guided fine-tuning of discrete diffusion models that generate sequences of any length. The method jointly optimizes how tokens are inserted and unmasked during generation, plus the inference schedule, using a theoretically grounded approach that converges to reward-optimized outputs without needing target examples.