A mid-sized multimodal model that handles both text and image inputs, quantized to 8-bit for reduced memory footprint. It trades some precision for accessibility, making it runnable on hardware that couldn't otherwise fit the full-precision version. Behavior reflects Gemma 4's underlying capabilities, with the MLX-community packaging optimizing it for Apple Silicon workflows.