LLM agents can effectively identify reproducibility issues in ML papers by analyzing code repositories, even without execution—but they struggle with precise localization of problems.
ReproRepo is a scalable framework that uses GitHub issues as natural training data to evaluate whether AI agents can identify reproducibility problems in machine learning research. Testing on 1,149 papers, the best agent found at least one relevant issue for ~90% of papers, showing LLMs can spot real-world blockers without running code.