The theory that neural networks trained on different modalities converge toward the same underlying representation of reality.