Measuring research data reuse in scholarly publications using generative artificial intelligence: Open Science Indicator development and preliminary results

Lauren Cadwallader, Iain Hrynaszkiewicz, parth sarin, Tim Vines|April 30, 2026arXiv

Key Takeaway

LLMs can automatically detect data reuse in scientific papers, revealing that open data sharing has far greater downstream impact than traditional metrics suggest.

Summary

Researchers used large language models to detect when published studies reuse data from other research. They found that 43% of papers reuse existing data—much higher than previous measurement methods could show. This demonstrates that AI can measure the real-world impact of open science practices at scale.

evaluation applications

Key Terms

language-model open-science data-reuse