MoE isn't just for giant models—on mobile devices, moderate sparsity with shared experts is both memory and compute-optimal, letting you get better performance with fewer active parameters than dense models.
MobileMoE brings Mixture-of-Experts (MoE) architecture to phones and edge devices by optimizing it for memory and compute constraints. The models use 0.3-0.9B active parameters but achieve better performance than larger dense models, running 2-4× faster on real smartphones while using less memory.