CARVE: Content-Aware Recurrent with Value Efficiency for Chunk-Parallel Linear Attention

Sayak Dutta|June 25, 2026arXiv

Key Takeaway

Recurrent models can match Transformer efficiency by making forget gates content-aware (looking at stored memory) rather than memory-blind, enabling a mathematical solver that speeds up training while improving language understanding.

Summary

CARVE improves recurrent neural networks by fixing how they decide what to forget. Instead of gates that only see new incoming data, CARVE's gates look at what's already stored in memory before deciding what to erase. This single change fixes three architectural problems, enables faster training, and achieves better performance on language tasks while using less memory than competing approaches.

architecture efficiency training

Key Terms

recurrent-neural-network-transducer linear-attention gating-mechanism chunk-parallel-training