You don't need separate expert sets per layer in MoE models—a shared expert pool with independent routers works better and uses fewer parameters, suggesting the standard per-layer expert allocation is unnecessarily wasteful.
UniPool replaces the standard Mixture-of-Experts design where each layer has its own expert set with a single shared pool of experts accessed by all layers. This reduces redundancy and allows expert parameters to grow sublinearly with model depth while improving performance and reducing parameter count by 30-60% compared to standard MoE.