Complete-muE: Optimal Hyperparameter Transfer and Scaling for MoE Models

Hongwu Peng, Ohiremen Dibua, Yuanjun Xiong, Yifan Gong, Jianming Zhang et al.|May 22, 2026arXiv

Key Takeaway

You can now tune hyperparameters on a single dense model and transfer them directly to MoE models of any size or configuration, eliminating the need for expensive hyperparameter search when scaling with MoE.

Summary

Complete-muE is a framework that solves the problem of transferring hyperparameters (like learning rate and weight decay) from dense neural networks to Mixture-of-Experts (MoE) models without expensive retuning.

training scaling efficiency

Key Terms

mixture-of-experts hyperparameter-transfer learning-rate-schedule router-scale