Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

Mutsumi Sasaki, Go kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui et al.|June 2, 2026arXiv

Key Takeaway

Language models don't truly understand unit conversion—they use shortcuts based on raw numbers and unit symbols, which breaks down when comparisons are close calls.

Summary

This paper investigates how language models compare quantities with units (like 110 cm vs 1.2 m). Researchers found that models use simple numerical shortcuts rather than properly converting to a common scale, leading to systematic errors near comparison boundaries. The study reveals models rely on heuristics based on number differences and unit scales.

evaluation reasoning

Key Terms

linear-probes causal-intervention quantitative-reasoning