A focused document understanding specialist that combines vision and language to extract and interpret text from images. It handles OCR tasks with multimodal input, processing both images and text within a large context window. As an open-weight model, it's transparent and adaptable, though its scope is narrowly oriented around visual text recognition rather than general-purpose reasoning.