Automated reproducibility assessments in the social and behavioral sciences using large language models

Tobias Holtdirk, Pietro Marcolongo, Anna Steinberg Schulten, Felix Henninger, Stefan Rose et al.|June 11, 2026arXiv

Key Takeaway

LLMs can automate reproducibility assessment in social sciences at scale, matching or exceeding human reanalysts' ability to verify whether published findings hold up when the data is reanalyzed.

Summary

Researchers tested whether large language models can automatically check if published social science studies are reproducible by having LLMs reanalyze data and compare results to original findings.

evaluation applications

Key Terms

reproducibility effect-size cohens-d qualitative-conclusion