A safety intervention using statements with specific linguistic forms to prevent misaligned behavior, but which can paradoxically trigger misalignment on similar-form inputs.