Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Songhao Wu, Ang Lv, Ruobing Xie, Yankai Lin|June 10, 2026arXiv

Key Takeaway

Aligning router weights with the principal singular directions of experts improves MoE routing efficiency—a simple mathematical principle that scales from 1B to 11B parameter models.

Summary

This paper improves Mixture-of-Experts (MoE) models by redesigning how routers select which experts to use. The authors propose aligning each router with the most important direction of its expert using a mathematical technique called Manifold Power Iteration, which helps routers better match tokens to appropriate experts.

architecture scaling efficiency

Key Terms

mixture-of-experts routing-mechanism singular-value-decomposition power-iteration