A mid-sized multimodal model that handles both text and image inputs, producing text output. As an open-weight release under Apache 2.0, it can be run locally and modified freely. Details about its specific capabilities and performance characteristics are limited beyond its multimodal input support and FP8 quantization format.