AI agents are vulnerable to attacks through the skills/tools they use, and current defenses don't reliably protect against poisoned skills that either attack immediately or mutate silently over time.
This paper introduces SkillHarm, a benchmark for testing how AI agents can be attacked through poisoned skills (tools/functions they use). It covers two attack types: fixed poisoned skills and skills that secretly mutate over time. The researchers built an automated system to generate 879 realistic attack samples and found that current agents are vulnerable to these attacks 69-86% of the time.