When evaluating hate speech detection systems, using soft labels and explanations that capture human disagreement produces more reliable results than forcing agreement through majority voting.
This paper examines how human disagreement affects both labels and explanations in hate speech detection. The researchers unified different evaluation approaches and tested how well models perform when trained on different representations of labels and rationales (explanations), finding that softer representations better capture human variation and disagreement.