For RL-based reasoning training, which tasks you select for training matters more than how many tasks you use—task-specific selection outperforms averaging strategies, and this insight can guide practical data curation for extending RL to general reasoning domains.
SUPERNOVA is a data curation framework that helps language models learn general reasoning skills (like causal inference and temporal understanding) through reinforcement learning.