LLMs can automate reproducibility assessment in social sciences at scale, matching or exceeding human reanalysts' ability to verify whether published findings hold up when the data is reanalyzed.
Researchers tested whether large language models can automatically check if published social science studies are reproducible by having LLMs reanalyze data and compare results to original findings.