A multimodal open-weight model from Google's Gemma family, capable of processing both text and images as input. It runs in a quantized form (W4A16, channel-wise), meaning weights are compressed to 4-bit integers while activations stay at 16-bit — a trade-off that reduces memory footprint at some potential cost to precision. Practical for deployment on constrained hardware.