Revisiting Non-Verbatim Memorization in Large Language Models: The Role of Entity Surface Forms

Yuto Nishida, Naoki Shikoda, Yosuke Kishinami, Ryo Fujii, Makoto Morishita et al.|April 23, 2026arXiv

Key Takeaway

LLMs don't memorize facts in a surface-invariant way; their ability to answer factual questions depends heavily on which name or spelling variant you use for an entity, suggesting memorization is tied to specific linguistic forms encountered during training.

Summary

This paper investigates how large language models memorize facts by testing whether they can answer questions about the same entity using different names and spellings.

evaluation data

Key Terms

non-verbatim-memorization surface-form entity-based-qa surface-invariance