Self-distillation trades diversity for accuracy: models become overconfident in their preferred solutions, hurting performance on out-of-distribution tasks that need varied strategies.
This paper reveals a hidden cost of on-policy self-distillation: while it achieves high average accuracy, it reduces output diversity by amplifying the model's existing biases. The authors show theoretically and empirically that self-distillation concentrates probability mass on dominant modes, causing pass@k curves to flatten—generating more rollouts doesn't improve accuracy like it should.