On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity — ThinkLLM