Testing whether a model trained on one dataset generalizes to perform the same task on a different dataset.