Quantitative metrics for evaluating AI explanations (like sparsity and faithfulness) don't predict whether explanations actually help humans make better decisions in high-stakes settings—you need human-centered evaluation, not just mathematical benchmarks.
This paper evaluates eight different Shapley value methods—a popular AI explanation technique—by testing them with real financial analysts on fraud detection and risk assessment tasks.