A quantized variant of Qwen's mixture-of-experts architecture, activating around 3 billion parameters per forward pass despite a 35 billion total parameter count. This makes it notably efficient at inference time — doing more with less compute. The NVFP4 precision format targets NVIDIA hardware specifically, trading some numerical fidelity for speed and memory savings.