You can efficiently extend pretrained LLMs to handle much longer contexts by converting them to hybrid architectures without retraining from scratch—this is more practical than building new models entirely.
This paper presents HyLo, a method to convert pretrained Transformer language models into hybrid architectures that combine Transformers with efficient linear sequence models (like Mamba2). By reusing existing model checkpoints and adding long-context training, HyLo extends context length by 32x while reducing memory use by 90%, enabling 2M-token processing on standard hardware.