For climate emulators, optimizing training data diversity through iterative refinement produces better generalization than simply using more standard scenarios, even with smaller datasets.
This paper shows that training data quality matters more than quantity for climate AI models. Instead of using many standard climate scenarios, researchers created a method to design fewer but more diverse training scenarios that teach AI models to better predict climate behavior across different conditions—like distinguishing how greenhouse gases versus aerosols affect the climate.