Formalizing the Binding Problem

Lianghuan Huang, Yihao Li, Saeed Salehi, Yingshan Chang, Ansh Soni et al.|June 2, 2026arXiv

Key Takeaway

Vision Transformers struggle with binding (knowing which features go together), and this limitation explains why they fail at tasks involving feature-sharing or occluded objects—making binding a measurable and critical component of visual understanding.

Summary

This paper formalizes the 'binding problem'—how AI models know which visual features belong to the same object—using information theory. The authors develop a probing method to measure binding information in Vision Transformers and show that binding is crucial for understanding scenes with multiple objects, especially when objects share features or overlap.

evaluation architecture reasoning

Key Terms

binding-problem vision-transformer feature-binding probing-method