Instead of treating all reasoning steps equally when a final answer is wrong, you can use the model to identify which intermediate steps were on the right track—this reduces training variance and improves sample efficiency for reasoning models.
This paper tackles a key problem in training reasoning models: when you can only check if the final answer is correct, how do you know which steps in the reasoning process were actually helpful? RREDCoT solves this by using the model itself to figure out which parts of the reasoning chain deserve more credit, improving training efficiency without extra computation.