Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Nathanaël Jacquier, Maria Vakalopoulou, Mahdi S. Hosseini|June 25, 2026arXiv

Key Takeaway

Adding soft sparsity regularizers to Top-k sparse autoencoders makes interpretable features more robust and concentrated, without the drawbacks of earlier penalty-based approaches—hard and soft sparsity work better together.

Summary

This paper improves sparse autoencoders (SAEs) for interpreting vision models by adding sparsity regularizers to the Top-k SAE architecture. The researchers introduce two penalty methods that work alongside Top-k's hard sparsity constraint to make learned features more interpretable (monosemantic) without hurting reconstruction quality.

efficiency evaluation

Key Terms

sparse-autoencoder monosemanticity polysemanticity sparsity-regularizer