LLMs can automatically detect data reuse in scientific papers, revealing that open data sharing has far greater downstream impact than traditional metrics suggest.
Researchers used large language models to detect when published studies reuse data from other research. They found that 43% of papers reuse existing data—much higher than previous measurement methods could show. This demonstrates that AI can measure the real-world impact of open science practices at scale.