For agentic workflows with multiple LLMs, predicting and allocating resources based on each LLM's typical execution share is more effective than optimizing each LLM independently.
Scepsy is a system for efficiently running multi-LLM agentic workflows on GPU clusters. Instead of treating each LLM independently, it profiles how much execution time each LLM typically uses, then uses this information to intelligently allocate GPUs and decide how to parallelize work. This approach achieves much higher throughput and lower latency than existing methods.