Language models naturally converge on similar periodic number representations across different architectures, but whether they learn features useful for arithmetic depends on training signals like text-number co-occurrence or multi-token addition problems.
Different language models (Transformers, RNNs, LSTMs) independently learn to represent numbers using periodic patterns with periods of 2, 5, and 10—a phenomenon called convergent evolution.