BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-based Visual Question Answering

Jinghong Chen, Jingbiao Mei, Guangyu Yang, Bill Byrne|April 24, 2026arXiv

Key Takeaway

By treating retrieved documents as an ensemble with probabilistic weights updated during generation, BERAG avoids concatenating long contexts while improving both performance and interpretability—especially valuable for visual question answering where context length is expensive.

Summary

This paper proposes BERAG, a retrieval-augmented generation system that processes retrieved documents individually rather than concatenating them into one long context. Instead of treating all documents equally, BERAG uses Bayesian inference to weight documents based on how useful they are during answer generation, updating these weights token-by-token.

multimodal reasoning

Key Terms

rag-retrieval-augmented-generation lost-in-the-middle ensemble-weights document-attribution visual-question-answering