Software that runs a trained model to generate predictions or outputs; vllm is an optimized inference engine for large language models.