Learning to Think from Multiple Thinkers

Nirmit Joshi, Roey Magen, Nathan Srebro, Nikolaos Tsilivis, Gal Vardi|April 27, 2026arXiv

Key Takeaway

Learning from diverse reasoning traces is harder than learning from a single thinker, but you can overcome this by actively collecting reasoning data from many thinkers (logarithmic in target accuracy) combined with passive final-answer supervision.

Summary

This paper studies how AI models can learn from multiple people or programs solving the same problem in different ways (e.g., different math solutions or code implementations).

training reasoning data

Key Terms

chain-of-thought active-learning reasoning-trace end-result-supervision