Performance difference when a trained policy transfers between two different environment implementations.