A system that hosts trained ML models and processes incoming prediction requests on deployed hardware like GPUs.