Language models can improve continuously by periodically consolidating in-context learning into permanent parameters and self-generating training data, rather than only learning from human-provided examples.
This paper proposes a 'Sleep' paradigm for language models that enables continual learning and knowledge consolidation. During sleep, models distill short-term memories into long-term parameters through knowledge seeding (upward distillation), then self-improve via dreaming—using reinforcement learning to generate synthetic training data.