Diffusion language models can now use speculative decoding—a proven speedup technique from autoregressive models—by using a simple masking strategy that preserves valid token context during verification.
This paper introduces SimSD, a technique that speeds up diffusion language models by enabling them to verify multiple predicted tokens at once, similar to how autoregressive models work. The key innovation is a masking strategy that gives diffusion models the right context to check draft predictions efficiently, achieving up to 7.46x faster inference without sacrificing quality.