A high-performance serving framework that efficiently runs language models and embedding models with optimized memory usage and throughput for production deployments.