A compact 1.2B parameter model from opendatalab that handles both text and image inputs, suggesting multimodal document understanding capabilities. Its naming hints at document parsing or data extraction work, though specific capability details beyond its multimodal input support are limited. It operates within a 32K token context window and is openly available under Apache 2.0.