Long-Context Aware Upcycling: A New Frontier for Hybrid LLM Scaling

Parsa Ashrafi Fashi, Utkarsh Saxena, Mehdi Rezagholizadeh, Aref Jafari, Akash Haridas et al.|April 27, 2026arXiv

Key Takeaway

You can efficiently extend pretrained LLMs to handle much longer contexts by converting them to hybrid architectures without retraining from scratch—this is more practical than building new models entirely.

Summary

This paper presents HyLo, a method to convert pretrained Transformer language models into hybrid architectures that combine Transformers with efficient linear sequence models (like Mamba2). By reusing existing model checkpoints and adding long-context training, HyLo extends context length by 32x while reducing memory use by 90%, enabling 2M-token processing on standard hardware.

architecture efficiency scaling

Key Terms

hybrid-mamba-transformer kv-cache long-context-handling function-preserving-expansion knowledge-distillation