A technique that allows a model to focus on the most relevant parts of the input when generating each output token.
Performance retention over long documents and conversations