Beyond Distribution Sharpening: The Importance of Task Rewards

Sarthak Mittal, Leo Gagnon, Guillaume Lajoie|April 17, 2026arXiv

Key Takeaway

Task-based reward signals in RL training genuinely improve model capabilities beyond just amplifying existing patterns—sharpening alone is mathematically unstable and produces limited gains.

Summary

This paper compares two approaches to improving AI models with reinforcement learning: distribution sharpening (making existing capabilities more extreme) versus task-reward learning (teaching new skills). Using math tasks, the authors show that sharpening alone produces weak, unstable results, while task rewards enable robust performance gains and stable training.

training reasoning

Key Terms

distribution-sharpening task-reward-model reinforcement-learning