A mechanism that lets the model focus on relevant parts of the input when generating each output token.
Performance retention over long documents and conversations