A neural network design that combines recurrent elements with other architectural components to process sequential data more efficiently than standard transformers.