DOPD: Dual On-policy Distillation — ThinkLLM