When statistically comparing semantic breadth of words using embeddings, you must account for directional differences or your significance tests will be unreliable—this paper provides a practical, GPU-accelerated solution.
This paper solves a statistical problem in measuring how broadly a word's meaning spreads across different contexts using word embeddings. When comparing two words' semantic breadth, naive statistical tests fail because they confuse directional differences (where words point in different semantic directions) with actual breadth differences.