A neural network architecture designed as a different approach to the standard transformer model, often with different trade-offs in speed, memory, or capability.
Performance retention over long documents and conversations