Backdoor attacks on multimodal AI models can be made significantly stealthier by generating context-aware poisoned outputs rather than fixed patterns—a critical finding for securing VLMs in production.
This paper reveals that existing backdoor attacks on Vision-Language Models are easier to detect than previously thought, and introduces Phantasia, a new attack that generates contextually appropriate malicious responses instead of fixed patterns, making it much harder to spot while maintaining normal performance.