Using proxy models as intermediaries between diverse teachers prevents conflicting gradients and enables learning richer egocentric representations from heterogeneous knowledge sources—achieving better results than naive multi-teacher distillation.
This paper introduces UNIEGO, a unified egocentric video encoder trained through a novel multi-teacher distillation framework.