EQUITRIAGE: A Fairness Audit of Gender Bias in LLM-Based Emergency Department Triage

Richard J. Young, Alice M. Matthews|May 5, 2026arXiv

Key Takeaway

Before deploying LLMs in clinical settings, you need model-specific fairness audits using counterfactual testing—demographic parity alone doesn't guarantee fair decisions, and interventions like demographic blinding work differently across models.

Summary

Researchers audited five large language models for gender bias in emergency department triage decisions, finding that all models showed concerning flip rates (9.9-43.8%) when patient gender was swapped.

safety evaluation alignment

Key Terms

counterfactual-evaluation demographic-blinding calibration fairness-audit