A large mixture-of-experts model that activates only 2 billion parameters at a time despite its 24B total size, keeping inference lean without sacrificing breadth of knowledge. It handles text tasks with the efficiency you'd expect from a sparse architecture — fast responses, lower memory pressure. The 5-bit MLX quantization makes it particularly suited for Apple Silicon environments.