A training technique where knowledge from multiple models is combined and compressed into a single, smaller model for better efficiency.