A technique where a model learns to gradually remove random noise from data to reconstruct meaningful content, used as an alternative to traditional token prediction.