The learning rate specifically applied to the embedding layer, which can be scaled independently from other layers.