The process of removing duplicate or near-duplicate examples from training data to improve model efficiency and prevent overfitting to repeated content.