LLM backdoors don't need suspicious text triggers—attackers can hide them in positional encoding, making them invisible to content-based defenses and activatable through normal conversation length patterns.
This paper reveals a new way to attack large language models by exploiting how they process word positions rather than modifying the text itself. Researchers show that backdoors can be triggered by input length alone, allowing attackers to make models leak secrets or misbehave without leaving obvious traces in the conversation.