A mid-sized multimodal model from Google's Gemma 4 family, quantized to 4-bit precision for efficient local deployment via MLX. It handles both text and image inputs, making it capable of visual understanding tasks alongside text work. The 4-bit quantization reduces memory footprint significantly, though with some potential quality trade-off compared to full-precision versions.