The latency from receiving a request to generating the first output token, a key metric for interactive systems.