When a model trained with RL produces repetitive, similar outputs instead of varied responses, reducing usefulness.