When a model's behavior diverges from the original training data distribution during fine-tuning or RL.