The ability of a model to generate outputs quickly and with low computational resource consumption during real-world use.