Current AI agents fail at domain-specific scientific reasoning in genomics: they can locate data and perform calculations, but lack the deeper understanding needed to make correct analytical decisions for specialized assays.
EpiBench is a benchmark that tests whether AI agents can perform epigenomics analysis tasks—like analyzing DNA sequencing data from CUT&Tag, ATAC-seq, and ChIP-seq experiments—by making correct decisions and returning verifiable answers.