Masked diffusion models decode differently than autoregressive LLMs—they uncover entities first, then relations—but supervised fine-tuning disrupts this natural strategy; a simple inference-time fix recovers significant quality gains.
This paper studies how masked diffusion language models generate text from graphs by analyzing the order tokens are unmasked. The authors discover these models naturally prioritize entities before relational words, but fine-tuning breaks this strategy by locking in sentence-ending tokens too early.