A statistical test comparing two models on the same set of examples to detect differences in performance.