Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks in High-Stakes Settings

Inês Oliveira e Silva, Sérgio Jesus, Iker Perez, Rita P. Ribeiro, Carlos Soares et al.|April 24, 2026arXiv

Key Takeaway

Quantitative metrics for evaluating AI explanations (like sparsity and faithfulness) don't predict whether explanations actually help humans make better decisions in high-stakes settings—you need human-centered evaluation, not just mathematical benchmarks.

Summary

This paper evaluates eight different Shapley value methods—a popular AI explanation technique—by testing them with real financial analysts on fraud detection and risk assessment tasks.

evaluation safety applications

Key Terms

shapley-values explainability automation-bias faithfulness