You can efficiently attribute model predictions to training data by measuring how small perturbations in activation space affect outputs, rather than tracking gradients across billions of parameters.
STRIDE is a new method for tracing which training examples influenced a model's predictions. Instead of expensive retraining or tracking billions of parameters, it learns lightweight "steering operators" that show how subsets of training data change model behavior. This makes attribution 13× faster while working better than previous methods.