The Role of Feedback Alignment in Self-Distillation

Semih Kara, Oğuzhan Ersoy|June 9, 2026arXiv

Key Takeaway

When training models to improve without external feedback, align your critique to the model's actual reasoning steps—this focuses learning on real errors rather than forcing unnecessary changes to correct behavior.

Summary

This paper studies how to design feedback for self-distillation in language models. The key finding: step-by-step critiques aligned with the model's reasoning trace work better than binary rewards or reference solutions because they target only the tokens where reasoning actually fails, leaving correct steps unchanged.

training reasoning

Key Terms

self-distillation reasoning-trace token-credit-assignment grpo