A mid-sized multimodal model from the Qwen 3 family, repackaged by Unsloth in MLX NVFP4 quantized format for efficient local inference. It handles both text and image inputs, making it capable of visual understanding tasks alongside language work. The quantized format trades some precision for reduced memory footprint and faster throughput on compatible hardware.