Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

Niclas Lietzow, Danielle Bitterman, Carsten Eickhoff, William Rudman, Michal Golovanevsky|June 26, 2026arXiv

Key Takeaway

Vision-language models have a sparse, identifiable causal circuit that controls whether they trust visual input or stored knowledge—removing just a few attention heads flips the model from knowledge-based to vision-based answers in most cases.

Summary

This paper reveals how vision-language models choose between visual evidence and memorized knowledge when they conflict. Using activation analysis, researchers identified a small set of attention heads (2.5-4.8% of heads) that act as a causal switch: removing them makes models trust their eyes instead of what they've learned.

multimodal evaluation

Key Terms

activation-patching residual-stream attention-head vision-language-model mechanistic-interpretability