Low-confidence token predictions in diffusion models contain valuable lookahead information for retrieval—you can use them to fetch better evidence mid-generation, improving reasoning tasks while maintaining the speed advantage of parallel decoding.
This paper shows that discrete diffusion language models (which generate text by gradually denoising masked tokens in parallel) produce useful intermediate predictions that can guide retrieval.