A model built using transformer architecture, which uses attention mechanisms to understand relationships between different parts of the input.