BAS: A Decision-Theoretic Approach to Evaluating Large Language Model Confidence

Sean Wu, Fredrik K. Gustafsson, Edward Phillips, Boyan Gao, Anshul Thakur et al.|April 3, 2026arXiv

Key Takeaway

LLMs often express high confidence in wrong answers, and standard evaluation metrics miss this problem—BAS provides a decision-focused alternative that rewards models for knowing when to say 'I don't know' instead of guessing confidently.

Summary

This paper introduces BAS (Behavioral Alignment Score), a new metric for measuring whether LLMs' confidence levels are actually useful for deciding when to abstain from answering. Unlike standard metrics that treat all errors equally, BAS penalizes overconfident wrong answers more heavily, reflecting real-world decision-making where false confidence is costlier than admitting uncertainty.

evaluation safety alignment

Key Terms

calibration abstention overconfidence proper-scoring-rule decision-theoretic