Toward Calibrated Mixture-of-Experts Under Distribution Shift

Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu|June 18, 2026arXiv

Key Takeaway

Expert-level calibration alone isn't enough for soft-routed MoE models under distribution shift—you need to explicitly calibrate the routing mechanism's aggregate predictions to maintain trustworthy uncertainty estimates.

Summary

This paper studies how mixture-of-experts (MoE) models maintain calibrated predictions under distribution shift. The authors show that calibrating individual experts works for hard-routed models but fails for soft-routed ones, and propose an adversarial reweighting method to improve calibration across different routing mechanisms and data distributions.

training evaluation efficiency

Key Terms

calibration mixture-of-experts distribution-shift routing-mechanism adversarial-reweighting