Explainability methods can reveal that neural networks for physics tasks learn interpretable, physically meaningful features—not just statistical shortcuts—enabling scientists to trust and debug AI models in high-energy physics.
This paper compares three explainability methods (GNNExplainer, GNNShap, GradCAM) to understand why neural networks make accurate jet tagging predictions at particle colliders. By mapping explanations to known physics features like jet substructure, the authors show that these networks learn real QCD patterns and provide tools for interpreting black-box physics models.