A mid-sized mixture-of-experts model from Qwen that punches above its active parameter count by activating only 3B parameters per forward pass despite having 35B total. It handles both text and images, making it multimodal out of the box. The FP8 quantization keeps memory footprint lean while preserving most of the full-precision capability.