SimSD: Simple Speculative Decoding in Diffusion Language Models

Junxia Cui, Haotian Ye, Runchu Tian, Hongcan Guo, Jinya Jiang et al.|June 1, 2026arXiv

Key Takeaway

Diffusion language models can now use speculative decoding—a proven speedup technique from autoregressive models—by using a simple masking strategy that preserves valid token context during verification.

Summary

This paper introduces SimSD, a technique that speeds up diffusion language models by enabling them to verify multiple predicted tokens at once, similar to how autoregressive models work. The key innovation is a masking strategy that gives diffusion models the right context to check draft predictions efficiently, achieving up to 7.46x faster inference without sacrificing quality.

efficiency architecture

Key Terms

speculative-decoding diffusion-language-models masked-language-modeling blockwise-decoding kv-cache