Natural Language Query to Configuration for Retrieval Agents

Melissa Z. Pan, Negar Arabzadeh, Mathew Jacob, Fiodar Kazhamiaka, Esha Choukse et al.|May 26, 2026arXiv

Key Takeaway

You can optimize retrieval pipelines per-query rather than per-workload by using lightweight predictors trained on query characteristics, achieving the same accuracy at significantly lower cost or better accuracy at the same cost.

Summary

This paper presents BRANE, a system that automatically selects the best configuration for retrieval agents on a per-query basis. Instead of manually tuning a retrieval pipeline once, BRANE analyzes each query to predict which combination of LLM, retriever, and other settings will work best, allowing teams to optimize for either accuracy or cost at inference time without retraining.

agents efficiency evaluation

Key Terms

retrieval-augmented-generation cost-quality-tradeoff pipeline-configuration per-query-optimization