The difference in visual features that CNNs prioritize (textures) versus vision transformers (shapes), affecting their robustness and generalization.