A single metric based on the model's confidence distribution at the first answer token can reliably detect hallucinations without expensive multi-sample generation, making it a practical baseline for production systems.
This paper shows that checking a language model's confidence on just the first token of an answer can detect hallucinations as well as methods that generate multiple answers and compare them. The approach is faster and simpler, requiring only a single model run instead of repeated sampling.