Beyond Distribution Sharpening: The Importance of Task Rewards — ThinkLLM