Training LLMs to accurately self-assess their performance creates a powerful RL signal that improves both calibration and accuracy—models that know what they don't know become more reliable and better at learning.
This paper introduces reinforcement learning with metacognitive feedback (RLMF), a method that trains language models to accurately judge their own performance and express uncertainty faithfully.