Transfer learning with auxiliary tasks provably helps only under specific conditions—this paper gives exact formulas to check those conditions and optimal ways to combine auxiliary and main tasks in linear settings.
This paper provides theoretical guarantees for when auxiliary data helps in transfer learning. For linear regression, the authors derive exact formulas showing when and how auxiliary tasks improve performance. For linear neural networks with shared representations, they prove the first non-vacuous conditions for beneficial auxiliary learning and show how to optimally weight different tasks.