Training LLMs on chronologically ordered data instead of shuffled data improves their knowledge of recent facts and temporal accuracy, suggesting data ordering matters for building models that stay current.
This paper investigates how the order of training data affects what LLMs learn about time-sensitive facts. Researchers trained 6B-parameter models on chronologically ordered data versus shuffled data, and found that sequential training produces models with more current and accurate temporal knowledge while maintaining general language understanding.