A large mixture-of-experts model that activates only 17 billion of its 397 billion parameters per forward pass, keeping inference costs manageable while drawing on a vast pool of specialized knowledge. It handles both text and images, reasoning across visual and written content with reasonable coherence. The sparse activation pattern means it can punch above its weight computationally, though behavior can occasionally feel uneven across domains depending on which experts engage.