The speed at which a model generates output tokens one at a time, a critical bottleneck in long-context scenarios.