A parameterization method that keeps optimal learning rates approximately constant across different model sizes.