SAEs don't cleanly capture continuous concept structures—they fragment them across features in ways that hide geometric relationships, suggesting interpretability research needs to look for groups of features rather than individual directions.
Sparse autoencoders (SAEs) are popular tools for finding interpretable features in AI models, but this paper shows they struggle to capture concepts organized as continuous geometric structures (manifolds).