Task-Adaptive Embedding Refinement via Test-time LLM Guidance

Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo|May 12, 2026arXiv

Key Takeaway

You can boost embedding model performance on hard search tasks by having an LLM refine queries at test-time, making embeddings practical for scenarios where running LLMs on all documents is too expensive.

Summary

This paper shows how to improve embedding models for search and classification by using an LLM to refine user queries in real-time. Instead of changing the embedding model itself, the approach adjusts the query representation based on feedback from a small sample of documents, achieving up to 25% improvement on challenging tasks without requiring expensive LLM processing at scale.

efficiency evaluation

Key Terms

embedding-model zero-shot-learning query-refinement semantic-search test-time-adaptation