Combining supervised retrieval scoring with zero-shot LLM reasoning can dramatically improve dataset discovery—achieving 5x better recall through score fusion and an additional 28% improvement through agentic reranking without extra training.
This paper presents an agentic search system that helps geoscience researchers find relevant datasets and tools from NASA's Earth Observation Knowledge Graph using natural language queries. The system combines supervised neural retrieval with LLM-based reasoning to significantly improve search accuracy, and includes a new benchmark of 47k query-dataset pairs for evaluation.