Stacked blocks of neural network computations that process and transform input text progressively, with more layers generally allowing the model to learn more complex patterns.