ClawBench: Can AI Agents Complete Everyday Online Tasks?

Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao et al.|April 9, 2026arXiv

Key Takeaway

Current AI agents struggle with real-world web tasks that require document understanding, multi-step navigation, and detailed form-filling—even frontier models succeed on less than 40% of everyday online activities.

Summary

ClawBench is a benchmark with 153 real-world online tasks across 144 live websites—like booking appointments, filling forms, and submitting applications—to test whether AI agents can handle everyday work.

agents evaluation applications

Key Terms

agentic-tasks multi-step-task-execution gui-agent web-interaction