A large mixture-of-experts model that activates only 10 billion parameters per forward pass despite its 122B total scale, making it surprisingly economical to run. It handles both text and images, switching between a deliberate thinking mode and a faster direct-response mode depending on the task. The FP8 quantization keeps memory footprint manageable without dramatically sacrificing capability.