Historical text imposes a consistent encoding penalty on LLMs, but models retain semantic understanding—making them safe for retrieval tasks if generative applications are adapted with temporal context.
This paper diagnoses why language models struggle with historical text by breaking down the problem into four dimensions: tokenization cost, predictive uncertainty, semantic robustness, and context sensitivity.