Despite claims of progress, multimodal domain generalization methods show only marginal improvements over basic approaches when fairly compared—the field needs better methods and standardized evaluation to make real progress.
This paper creates MMDG-Bench, the first standardized benchmark for multimodal domain generalization across action recognition, fault diagnosis, and sentiment analysis. Testing 9 methods on 6 datasets with 7,402 trained models, it reveals that recent specialized methods barely beat simple baselines, no method works consistently across tasks, and all methods struggle with corrupted or missing data.