Language models can improve long-context reasoning by periodically consolidating recent information into fast weights during offline 'sleep' phases, trading inference latency for better performance on reasoning-heavy tasks.
This paper proposes a sleep-like mechanism for language models that periodically consolidates recent context into persistent memory before clearing the cache. During 'sleep,' the model performs offline passes to update fast weights in state-space blocks, shifting computation away from real-time inference.