Mitigating Label Bias with Interpretable Rubric Embeddings

Calvin Isley, Johann D. Gaebler, Sharad Goel|May 20, 2026arXiv

Key Takeaway

Replace opaque learned embeddings with interpretable features derived from expert-defined rubrics to reduce bias inheritance from biased training labels in high-stakes decisions.

Summary

When training AI models on biased historical data (like past hiring decisions), the models learn and perpetuate those biases. This paper proposes using 'rubric embeddings'—features based on expert-defined criteria—instead of black-box embeddings to make fairer predictions. Testing on university admissions data, the approach reduces group disparities while maintaining quality.

alignment evaluation

Key Terms

rubric label-bias proxy-signal semantic-grounding