When Certainty Is an Artifact: Keyword Lexicon Blindness and the (Mis)Measurement of Rhetorical Stance

Bo Chen|June 24, 2026arXiv

Key Takeaway

Statistically significant findings from keyword-based text analysis can be entirely artifacts of the measurement method rather than real phenomena. Always validate keyword-based results with semantic approaches before drawing conclusions about speaker psychology or discourse patterns.

Summary

This paper reveals how keyword-based measurement tools can produce false findings in computational social science. By comparing keyword counting to LLM-based semantic analysis of interviews, the authors show that a strong statistical correlation between negative affect and certainty disappears—and even reverses—when using more accurate measurement.

evaluation data

Key Terms

keyword-lexicon zero-shot-semantic-classification measurement-artifact polysemy-blindness syntactic-blindness