Taming Outlier Tokens in Diffusion Transformers — ThinkLLM