Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Gabrielle Kaili-May Liu, Avi Caciularu, Gal Yona, Idan Szpektor, Arman Cohan|June 30, 2026arXiv

Key Takeaway

Training LLMs to accurately self-assess their performance creates a powerful RL signal that improves both calibration and accuracy—models that know what they don't know become more reliable and better at learning.

Summary

This paper introduces reinforcement learning with metacognitive feedback (RLMF), a method that trains language models to accurately judge their own performance and express uncertainty faithfully.

alignment training evaluation

Key Terms

metacognitive-features calibration preference-optimization active-learning hallucination