Replace opaque learned embeddings with interpretable features derived from expert-defined rubrics to reduce bias inheritance from biased training labels in high-stakes decisions.
When training AI models on biased historical data (like past hiring decisions), the models learn and perpetuate those biases. This paper proposes using 'rubric embeddings'—features based on expert-defined criteria—instead of black-box embeddings to make fairer predictions. Testing on university admissions data, the approach reduces group disparities while maintaining quality.