A frozen language model used to evaluate and score other model outputs according to predefined criteria.