For distributed machine learning without a central server, this algorithm achieves state-of-the-art communication efficiency by coupling gossip rounds with batch sizes, meaning you can train faster across networks with fewer total messages sent between nodes.
This paper presents MG-ADSGD, a decentralized learning algorithm where multiple agents optimize a shared problem by communicating only with neighbors. The algorithm combines acceleration techniques with efficient message-passing to achieve better communication efficiency than prior methods, requiring fewer total messages exchanged across the network to reach a solution.