ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah et al.|June 16, 2026arXiv

Key Takeaway

LLM agents can effectively identify reproducibility issues in ML papers by analyzing code repositories, even without execution—but they struggle with precise localization of problems.

Summary

ReproRepo is a scalable framework that uses GitHub issues as natural training data to evaluate whether AI agents can identify reproducibility problems in machine learning research. Testing on 1,149 papers, the best agent found at least one relevant issue for ~90% of papers, showing LLMs can spot real-world blockers without running code.

evaluation agents applications

Key Terms

llm-agent reproducibility semantic-matching