For building agentic systems that reason over visual documents, maintaining structured evidence across pages and actively managing context drift through sliding windows and intent injection significantly improves both accuracy and efficiency.
VISOR is an AI system that helps vision-language models retrieve and reason over visually rich documents by combining iterative search with multi-step reasoning.