A multimodal model that takes both text and images as input, producing text responses. It operates under an open-weight Apache 2.0 license, making its weights freely available for inspection and deployment. Details about its specific reasoning style or standout capabilities beyond image-and-text understanding are limited.