The dominant neural network architecture for language models, using self-attention to process sequences.