Using a separate model to evaluate and reject outputs that contain errors, improving final answer quality.