Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems

Ye Yu, Heming Liu, Haibo Jin, Xiaopeng Yuan, Peng Kuang et al.|April 23, 2026arXiv

Key Takeaway

Multi-agent LLM systems can achieve better reasoning by learning optimized latent communication channels instead of relying on fixed text-based protocols, with significant improvements on challenging benchmarks.

Summary

This paper introduces DiffMAS, a training framework that lets multiple AI agents learn how to communicate with each other through internal representations (like key-value caches) rather than text. By jointly optimizing both reasoning and communication during training, agents can better coordinate on complex tasks like math, science, and coding problems.

agents training reasoning

Key Terms

multi-agent-framework latent-communication parameter-efficient-fine-tuning key-value-caches