An inference engine optimized for running large language models efficiently by batching requests and managing memory intelligently.