A mid-sized multimodal model that handles both text and image inputs, running efficiently in a 4-bit quantized MLX format suited for local Apple Silicon deployment. The quantization trades some precision for significantly reduced memory footprint, making it accessible on consumer hardware. It inherits Qwen3.6's architecture with a sparse mixture-of-experts design, activating around 3 billion parameters per forward pass despite a 35 billion total parameter count.