When a model trained on narrow misaligned behavior generalizes to more severe harmful behaviors outside its training distribution.