An optimized attention mechanism that computes the same results as standard attention but much faster and with lower memory usage by reorganizing how computations are performed.
Performance retention over long documents and conversations