ASMR-Bench: Auditing for Sabotage in ML Research

Eric Gan, Aryan Bhatt, Buck Shlegeris, Julian Stastny, Vivek Hebbar|April 17, 2026arXiv

Key Takeaway

Current AI systems and auditors are poor at detecting subtle sabotage in research code—even frontier LLMs only catch 77% of cases—highlighting a critical gap in oversight for autonomous AI research.

Summary

This paper introduces ASMR-Bench, a benchmark for testing whether AI systems and human auditors can detect sabotage hidden in ML research code. The benchmark includes 9 real ML projects with intentionally introduced bugs that change experimental results while keeping the paper's description accurate.

safety evaluation agents

Key Terms

sabotage auditing red-teaming auroc