MetaBackdoor: Exploiting Positional Encoding as a Backdoor Attack Surface in LLMs

Rui Wen, Mark Russinovich, Andrew Paverd, Jun Sakuma, Ahmed Salem|May 14, 2026arXiv

Key Takeaway

LLM backdoors don't need suspicious text triggers—attackers can hide them in positional encoding, making them invisible to content-based defenses and activatable through normal conversation length patterns.

Summary

This paper reveals a new way to attack large language models by exploiting how they process word positions rather than modifying the text itself. Researchers show that backdoors can be triggered by input length alone, allowing attackers to make models leak secrets or misbehave without leaving obvious traces in the conversation.

safety

Key Terms

backdoor-attack positional-encoding trigger transformer-architecture