Timestep embeddings in diffusion models may be redundant—models can achieve competitive image quality without them by inferring noise scales directly from input corruption patterns.
This paper questions whether diffusion models actually need explicit timestep embeddings for denoising. The authors show theoretically and empirically that removing timestep information entirely doesn't significantly hurt performance on image generation tasks, and models can implicitly learn noise levels from corrupted inputs alone.