A large mixture-of-experts model that activates only 2 billion parameters per forward pass despite its 24B total parameter count, keeping inference lean while retaining broad knowledge. It handles text tasks with the efficiency you'd expect from a selectively-activated architecture — fast responses without burning through the full model on every token. The 8-bit MLX quantization makes it particularly suited for Apple Silicon machines where memory bandwidth matters.