Pretraining Recurrent Networks without Recurrence

Akarsh Kumar, Phillip Isola|June 4, 2026arXiv

Key Takeaway

SMT decouples learning what to remember from how to update memory, enabling RNNs to train in parallel with stable gradients—potentially making RNNs competitive with Transformers for long-sequence tasks without requiring sequential computation.

Summary

This paper proposes Supervised Memory Training (SMT), a new way to train recurrent neural networks that avoids the sequential bottleneck of standard backpropagation through time. Instead of unrolling RNNs over long sequences, SMT trains a Transformer to learn what information to remember, then uses those memory labels to train the RNN in parallel.

training architecture efficiency

Key Terms

backpropagation-through-time recurrent-neural-networks memory-transition predictive-state-objective