Problem in RNN training where gradients become extremely small or large over long sequences, making it hard to learn long-range dependencies.