Discrete diffusion models have a hidden training-inference mismatch: the standard objective doesn't match what's actually needed for sampling. Using the correct "leave-one-out" parameterization and an absorbing-state reformulation improves generation quality without retraining.
This paper fixes a fundamental mismatch in how Uniform Diffusion Models are trained versus used for generation. The authors show that standard training doesn't actually optimize what the model uses during sampling, and they provide mathematical conversions to align these.