Stateful Online Monitoring Catches Distributed Agent Attacks

Davis Brown, Samarth Bhargav, Arav Santhanam, Kasper Hong, Ivan Zhang et al.|May 29, 2026arXiv

Key Takeaway

Safety monitors that only check individual user sessions miss coordinated attacks split across accounts—you need to track patterns across groups of users to catch distributed agent misuse.

Summary

Language models can help attackers find vulnerabilities, and they're increasingly splitting harmful tasks across multiple accounts to evade detection.

safety agents evaluation

Key Terms

distributed-attack stateful-monitor agent-scaffold real-time-clustering