Studying how a model's internal computations and representations lead to specific behaviors or failures.