Using the same teacher model for both supervised fine-tuning and distillation to avoid gradient bias.