Choosing a subset of training data based on quality or relevance metrics rather than using all available data.