Modern LLMs depend on dozens of upstream models and datasets in complex, recursive ways; ModSleuth makes these invisible dependencies visible and traceable, exposing compliance and transparency issues in LLM development.
This paper introduces ModSleuth, a system that automatically traces the hidden dependencies between models and datasets used to build modern LLMs.