By treating retrieved documents as an ensemble with probabilistic weights updated during generation, BERAG avoids concatenating long contexts while improving both performance and interpretability—especially valuable for visual question answering where context length is expensive.
This paper proposes BERAG, a retrieval-augmented generation system that processes retrieved documents individually rather than concatenating them into one long context. Instead of treating all documents equally, BERAG uses Bayesian inference to weight documents based on how useful they are during answer generation, updating these weights token-by-token.