The point during training where reward signals stop improving, indicating the model may be memorizing rather than generalizing.