Using a language model to automatically evaluate or score outputs from other AI systems instead of human reviewers.