Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments

Hao Mi, Qiang Sheng, Shaofei Wang, Beizhe Hu, Yifan Sun et al.|May 5, 2026arXiv

Key Takeaway

Hallucination detection improves when you combine a model's internal uncertainty signals with its own self-judgments, enforcing that they logically agree—this dual-view approach catches more false claims than either method alone.

Summary

This paper tackles hallucination detection in large language models by combining two approaches: analyzing internal neural patterns and extracting explicit self-judgments from the model. The key innovation is a framework that treats these as logically connected signals—if a model says something is true and judges itself as correct, those signals should align.

safety evaluation

Key Terms

hallucination self-critique uncertainty-quantification logical-consistency