Language models don't truly understand unit conversion—they use shortcuts based on raw numbers and unit symbols, which breaks down when comparisons are close calls.
This paper investigates how language models compare quantities with units (like 110 cm vs 1.2 m). Researchers found that models use simple numerical shortcuts rather than properly converting to a common scale, leading to systematic errors near comparison boundaries. The study reveals models rely on heuristics based on number differences and unit scales.