A compact multimodal worker that punches above its weight through aggressive 4-bit quantization, trading some precision for dramatically reduced memory footprint. It handles both text and image inputs with the practical sensibility typical of Gemma models — clear reasoning, grounded responses, and minimal hallucination drama. The quantization means you may notice occasional degradation on nuanced or highly technical prompts compared to full-precision variants.