STRIDE: Training Data Attribution via Sparse Recovery from Subset Perturbations

Rishit Dagli, Abir Harrasse, Luke Zhang, Florent Draye, Amirali Abdullah et al.|June 3, 2026arXiv

Key Takeaway

You can efficiently attribute model predictions to training data by measuring how small perturbations in activation space affect outputs, rather than tracking gradients across billions of parameters.

Summary

STRIDE is a new method for tracing which training examples influenced a model's predictions. Instead of expensive retraining or tracking billions of parameters, it learns lightweight "steering operators" that show how subsets of training data change model behavior. This makes attribution 13× faster while working better than previous methods.

training evaluation

Key Terms

training-data-attribution sparse-recovery activation-space steering-operators