A compact multimodal model that punches with efficiency — the NVFP4 quantization means it runs leaner on compatible hardware without dramatic quality loss. It handles both text and image inputs, making it versatile for vision-language tasks. The trade-off is that quantized weights can introduce subtle degradation on nuanced reasoning compared to full-precision counterparts.