A compact multimodal model that handles both text and image inputs, producing text output. As a quantized variant (NVFP4) of Gemma 4, it trades some precision for reduced memory footprint and faster inference. Its open-weight nature means the internals are fully inspectable and deployable locally.