Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, Pavel Izmailov et al.|May 25, 2026arXiv

Key Takeaway

Self-generated replay nearly eliminates catastrophic forgetting in language models, but capacity constraints are the real bottleneck: a saturated model can't learn new tasks without forgetting, no matter what technique you use.

Summary

When language models learn new tasks, they forget old ones. This paper shows that models can generate their own training data to replay and prevent forgetting, but only if they have spare capacity. If a model is already saturated from pretraining, no amount of replay helps—it must overwrite old knowledge to learn anything new.

training efficiency

Key Terms

catastrophic-forgetting replay model-capacity continual-learning