Predicting how long an inference request will take to complete, accounting for hardware contention and concurrent execution.