How quickly a model can generate predictions or outputs after being given an input, measured in time per token or tokens per second.