A large mixture-of-experts model that activates only 17 billion parameters per forward pass despite its 397B total size, making it surprisingly efficient at inference time. It handles both text and images, switching between extended reasoning chains and direct responses depending on the task. The sheer parameter count gives it broad knowledge coverage, though deploying it still demands serious hardware.