EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

Harihara Muralidharan, Reema Baskar, Soo Hee Lee, Tim Proctor, Kenny Workman|June 11, 2026arXiv

Key Takeaway

Current AI agents fail at domain-specific scientific reasoning in genomics: they can locate data and perform calculations, but lack the deeper understanding needed to make correct analytical decisions for specialized assays.

Summary

EpiBench is a benchmark that tests whether AI agents can perform epigenomics analysis tasks—like analyzing DNA sequencing data from CUT&Tag, ATAC-seq, and ChIP-seq experiments—by making correct decisions and returning verifiable answers.

evaluation agents applications

Key Terms

agentic-tasks verifiable-answers benchmark domain-specific-knowledge