Self-generated replay nearly eliminates catastrophic forgetting in language models, but capacity constraints are the real bottleneck: a saturated model can't learn new tasks without forgetting, no matter what technique you use.
When language models learn new tasks, they forget old ones. This paper shows that models can generate their own training data to replay and prevent forgetting, but only if they have spare capacity. If a model is already saturated from pretraining, no amount of replay helps—it must overwrite old knowledge to learn anything new.