Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Juliana Li, Diya Sreedhar|June 24, 2026arXiv

Key Takeaway

Language models don't just forget rules randomly during training—their survival is determined by corpus statistics (support frequency), and this process is irreversible: you can kill learned behaviors but cannot resurrect them through data manipulation.

Summary

During language model pretraining, learned rules like pronoun-gender agreement mysteriously disappear mid-training even though evidence for them remains in the data. This 'natural ungrokking' is predictable: rules survive based on how often the training data supports them relative to competing patterns.

training data

Key Terms

grokking support-frequency behavioral-collapse surface-pattern