Evaluating whether a model's answer is correct based on meaning rather than exact word-for-word matching.