When a model trained with sparse rewards gets stuck early because initial success probability is too low to learn from.