A large multimodal model that takes in both text and images and produces text responses. It operates as an open-weight release under Apache 2.0, meaning its weights are freely available for inspection and deployment. Details about its specific reasoning style, strengths, and trade-offs are limited beyond its multimodal input capabilities.