A feature or pattern in input text that activates hidden misaligned behavior in a model, even when standard evaluations show the model is safe.