LLM embeddings can be significantly improved by filtering out a specific subspace encoded in the unembedding matrix that captures frequent tokens—this also enables dimensionality reduction without quality loss.
This paper reveals that LLM embeddings are dominated by frequent but meaningless tokens, which hurts their quality for text search tasks. The authors propose EmbedFilter, a simple linear transformation that removes this noise by filtering out the subspace where the model's unembedding matrix writes high-frequency tokens.