Learning from diverse reasoning traces is harder than learning from a single thinker, but you can overcome this by actively collecting reasoning data from many thinkers (logarithmic in target accuracy) combined with passive final-answer supervision.
This paper studies how AI models can learn from multiple people or programs solving the same problem in different ways (e.g., different math solutions or code implementations).