Tuning a model to process more requests or tokens per second, sometimes at the cost of individual response quality or latency.