SkillHarm: Lifecycle-Aware Skill-Based Attacks via Automated Construction

Yuting Ning, Zhehao Zhang, Yash Kumar Lal, Boyu Gou, Junyi Li et al.|June 1, 2026arXiv

Key Takeaway

AI agents are vulnerable to attacks through the skills/tools they use, and current defenses don't reliably protect against poisoned skills that either attack immediately or mutate silently over time.

Summary

This paper introduces SkillHarm, a benchmark for testing how AI agents can be attacked through poisoned skills (tools/functions they use). It covers two attack types: fixed poisoned skills and skills that secretly mutate over time. The researchers built an automated system to generate 879 realistic attack samples and found that current agents are vulnerable to these attacks 69-86% of the time.

safety agents evaluation

Key Terms

agent-skill skill-lifecycle poisoned-skill self-mutating-poisoning