Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study — ThinkLLM