The practice of publicly disclosing what data was used to train a model, enabling researchers to audit and understand potential biases or limitations.