Human Label Variation as Stable Signal: Learning Annotator-Specific Explanation Behavior via Cross-Annotator Preference Optimization

Beiduo Chen, Pingjun Hong, Ziyun Zhang, Benjamin Roth, Anna Korhonen et al.|May 27, 2026arXiv

Key Takeaway

Instead of treating annotator disagreement as noise, you can use it as a signal to train models that reproduce individual annotators' reasoning styles—useful for building explanation systems grounded in real human decision-making patterns.

Summary

This paper shows that large language models can learn to mimic how individual annotators explain their decisions, not just their labels. The researchers developed a method called CAPO that trains models on one annotator's explanations while contrasting them with other annotators' valid but different explanations for the same input, revealing stable patterns in how people reason about tasks.

data training evaluation

Key Terms

preference-optimization inter-annotator-agreement supervised-fine-tuning free-text-generation annotator-disagreement