Transferring knowledge between models with fundamentally different designs, attention mechanisms, or tokenizers.