Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel, Yaniv Romano|April 14, 2026arXiv

Key Takeaway

DDTree improves speculative decoding by constructing multiple draft token paths from a single diffusion model forward pass, then verifying them together—achieving faster inference than methods that verify only one path at a time.

Summary

This paper speeds up language model inference by improving how draft models propose token sequences. Instead of proposing a single sequence, DDTree builds a tree of multiple possible continuations from a diffusion-based drafter, then verifies them all at once. This increases the number of tokens accepted per verification step, making inference faster.

efficiency

Key Terms

speculative-decoding block-diffusion-language-model draft-tree ancestor-only-attention-mask