You can boost embedding model performance on hard search tasks by having an LLM refine queries at test-time, making embeddings practical for scenarios where running LLMs on all documents is too expensive.
This paper shows how to improve embedding models for search and classification by using an LLM to refine user queries in real-time. Instead of changing the embedding model itself, the approach adjusts the query representation based on feedback from a small sample of documents, achieving up to 25% improvement on challenging tasks without requiring expensive LLM processing at scale.