A training approach where a model learns to reconstruct clean audio from noisy versions, making it better at understanding speech in real-world conditions.