A large mixture-of-experts model that activates only 10 billion parameters per forward pass despite its 122B total size, making it surprisingly efficient at inference time. It handles both text and images, switching between a deliberate thinking mode and a faster direct-response mode depending on the task. Like a specialist who knows when to slow down and reason carefully versus when to answer off the cuff, it balances depth with speed.