Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

Kaustubh Kapil, Kishor P. Upla|July 2, 2026arXiv

Key Takeaway

Vision Transformers don't learn by making tokens independent; instead, they increase representational complexity through richer transformations while preserving strong token interactions, which challenges common assumptions about how these models develop.

Summary

This paper analyzes how Vision Transformers' internal representations change during training using geometric analysis tools.

architecture training

Key Terms

centered-kernel-alignment intrinsic-dimensionality representational-geometry token-covariance