Reliability Gated Multi-Teacher Distillation for Low Resource Abstractive Summarization

Dipto Sumit, Ankan Kumar Roy, Sadia Khair Rodela, Atia Haque Asha, Mourchona Afrin et al.|April 3, 2026arXiv

Key Takeaway

When distilling from multiple teachers for summarization, simpler logit-level knowledge distillation is more reliable than complex approaches, and teacher agreement should guide when to trust teacher vs. ground truth supervision.

Summary

This paper improves knowledge distillation for low-resource abstractive summarization by using multiple teacher models intelligently. It introduces methods that route learning between teacher guidance and ground truth based on teacher agreement, and constrains how student models relate to different teachers.

training efficiency data

Key Terms

knowledge-distillation multi-teacher-learning abstractive-summarization logit-distillation inter-teacher-agreement