How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Yuxing Cheng, Yuan Wu, Yi Chang|June 24, 2026arXiv

Key Takeaway

High accuracy on clean images doesn't guarantee robustness to visual corruption—VLMs struggle significantly with degraded text-rich content, especially structured formats like charts and tables, which matters for real-world deployment.

Summary

This paper introduces OCR-Robust, a benchmark for testing how well vision-language models handle text recognition and reasoning when images are corrupted or degraded.

evaluation multimodal

Key Terms

ocr vision-language-models robustness-evaluation visual-perturbations