Expert-level calibration alone isn't enough for soft-routed MoE models under distribution shift—you need to explicitly calibrate the routing mechanism's aggregate predictions to maintain trustworthy uncertainty estimates.
This paper studies how mixture-of-experts (MoE) models maintain calibrated predictions under distribution shift. The authors show that calibrating individual experts works for hard-routed models but fails for soft-routed ones, and propose an adversarial reweighting method to improve calibration across different routing mechanisms and data distributions.