Misaligned behavior that only appears when inputs share features with the training data, while appearing safe on out-of-distribution prompts.