LLMs possess an inherent ability to self-evaluate against external judges that can be efficiently unlocked with minimal training data, suggesting self-evaluation is about revealing existing knowledge rather than teaching new skills.
This paper shows that base language models already have a hidden ability to predict how external judges will score their outputs. The authors introduce SEE, a training method that surfaces this latent skill using just 160 examples—31x fewer than standard approaches—by combining reinforcement learning with distillation to improve both answer quality and calibration accuracy.