Combining multiple loss functions (e.g., language modeling and distillation) during training with weighted proportions.