A mid-sized multimodal model that handles both text and image inputs, running in a quantized 6-bit MLX format optimized for Apple Silicon. The thinking-capable architecture suggests it can engage in extended reasoning before producing responses. As a community-packaged weight, it trades some precision for accessibility and local deployment convenience.