DemoPSD: Disagreement-Modulated Policy Self-Distillation — ThinkLLM