A training approach where the model learns to reconstruct clean audio from corrupted or noisy versions, improving its ability to extract meaningful features.