Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao et al.|April 16, 2026arXiv

Key Takeaway

For agentic workflows with multiple LLMs, predicting and allocating resources based on each LLM's typical execution share is more effective than optimizing each LLM independently.

Summary

Scepsy is a system for efficiently running multi-LLM agentic workflows on GPU clusters. Instead of treating each LLM independently, it profiles how much execution time each LLM typically uses, then uses this information to intelligently allocate GPUs and decide how to parallelize work. This approach achieves much higher throughput and lower latency than existing methods.

agents efficiency

Key Terms

agentic-workflows tensor-parallelism gpu-allocation latency-throughput-predictor