When distilling from multiple teachers for summarization, simpler logit-level knowledge distillation is more reliable than complex approaches, and teacher agreement should guide when to trust teacher vs. ground truth supervision.
This paper improves knowledge distillation for low-resource abstractive summarization by using multiple teacher models intelligently. It introduces methods that route learning between teacher guidance and ground truth based on teacher agreement, and constrains how student models relate to different teachers.