HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models

Reihaneh Zohrabi, Hosein Hasani, Akshita Gupta, Mahdieh Soleymani Baghshah, Anna Rohrbach et al.|April 7, 2026arXiv

Key Takeaway

Attention-based hallucination detection is fundamentally flawed due to confounders; HaloProbe's Bayesian approach separates external and internal signals to detect hallucinations more reliably and mitigate them without degrading model performance.

Summary

Vision-language models often hallucinate objects that aren't in images. This paper shows that using attention weights to detect hallucinations is unreliable due to hidden confounders like token position.

safety evaluation multimodal

Key Terms

hallucination vision-language-model attention-mechanism bayesian-inference guided-decoding