The Character Error Vector: Decomposable errors for page-level OCR evaluation

Jonathan Bourne, Mwiza Simbeye, Joseph Nockels|April 7, 2026arXiv

Key Takeaway

CEV lets you diagnose whether OCR problems come from layout parsing or character recognition itself, helping teams focus improvements where they'll have the most impact on document extraction quality.

Summary

This paper introduces the Character Error Vector (CEV), a new metric for evaluating OCR quality that breaks down errors into parsing, OCR, and interaction components. Unlike traditional Character Error Rate, CEV works even when text layout parsing fails, making it practical for real-world document images with complex layouts.

evaluation applications

Key Terms

ocr character-error-rate document-understanding page-parsing