Convergent Evolution: How Different Language Models Learn Similar Number Representations

Deqing Fu, Tianyi Zhou, Mikhail Belkin, Vatsal Sharan, Robin Jia|April 22, 2026arXiv

Key Takeaway

Language models naturally converge on similar periodic number representations across different architectures, but whether they learn features useful for arithmetic depends on training signals like text-number co-occurrence or multi-token addition problems.

Summary

Different language models (Transformers, RNNs, LSTMs) independently learn to represent numbers using periodic patterns with periods of 2, 5, and 10—a phenomenon called convergent evolution.

training reasoning

Key Terms

fourier-domain geometric-separability convergent-evolution periodic-features