SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions

Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel|April 9, 2026arXiv

Key Takeaway

For RL-based reasoning training, which tasks you select for training matters more than how many tasks you use—task-specific selection outperforms averaging strategies, and this insight can guide practical data curation for extending RL to general reasoning domains.

Summary

SUPERNOVA is a data curation framework that helps language models learn general reasoning skills (like causal inference and temporal understanding) through reinforcement learning.

training reasoning data

Key Terms

reinforcement-learning-from-verifiable-rewards training-data-curation task-mixing-strategies instruction-tuning