Taming Outlier Tokens in Diffusion Transformers

Xiaoyu Wu, Yifei Wang, Tsu-Jui Fu, Liang-Chieh Chen, Zhe Gan et al.|May 6, 2026arXiv

Key Takeaway

Outlier tokens in diffusion transformers aren't just extreme values but represent corrupted local information; controlling them with register tokens significantly improves image generation quality.

Summary

This paper identifies and fixes a problem in Diffusion Transformers where certain tokens develop unusually high values that degrade image quality. The authors show this happens in both the image encoder and the generation model itself, and propose Dual-Stage Registers—a technique using learnable tokens to stabilize these problematic values and improve image generation.

architecture efficiency evaluation

Key Terms

diffusion-transformer outlier-tokens register-tokens vision-transformer