DDTree improves speculative decoding by constructing multiple draft token paths from a single diffusion model forward pass, then verifying them together—achieving faster inference than methods that verify only one path at a time.
This paper speeds up language model inference by improving how draft models propose token sequences. Instead of proposing a single sequence, DDTree builds a tree of multiple possible continuations from a diffusion-based drafter, then verifies them all at once. This increases the number of tokens accepted per verification step, making inference faster.