Large reasoning models frequently express confidence that doesn't match their actual uncertainty—a critical problem for deployment in high-stakes applications that current evaluation methods fail to capture.
This paper introduces a framework to measure whether large reasoning models (LRMs) accurately express their internal confidence through language. The researchers find that reasoning models often claim confidence they don't actually have, and that existing methods for measuring this problem don't work well with long reasoning traces.