Recurrent models can match Transformer efficiency by making forget gates content-aware (looking at stored memory) rather than memory-blind, enabling a mathematical solver that speeds up training while improving language understanding.
CARVE improves recurrent neural networks by fixing how they decide what to forget. Instead of gates that only see new incoming data, CARVE's gates look at what's already stored in memory before deciding what to erase. This single change fixes three architectural problems, enables faster training, and achieves better performance on language tasks while using less memory than competing approaches.