Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization

Ming Sun, Kun Yuan|June 5, 2026arXiv

Key Takeaway

For distributed machine learning without a central server, this algorithm achieves state-of-the-art communication efficiency by coupling gossip rounds with batch sizes, meaning you can train faster across networks with fewer total messages sent between nodes.

Summary

This paper presents MG-ADSGD, a decentralized learning algorithm where multiple agents optimize a shared problem by communicating only with neighbors. The algorithm combines acceleration techniques with efficient message-passing to achieve better communication efficiency than prior methods, requiring fewer total messages exchanged across the network to reach a solution.

training efficiency scaling

Key Terms

decentralized-training communication-efficiency strongly-convex gossip-averaging condition-number