Multi-agent LLM systems can achieve better reasoning by learning optimized latent communication channels instead of relying on fixed text-based protocols, with significant improvements on challenging benchmarks.
This paper introduces DiffMAS, a training framework that lets multiple AI agents learn how to communicate with each other through internal representations (like key-value caches) rather than text. By jointly optimizing both reasoning and communication during training, agents can better coordinate on complex tasks like math, science, and coding problems.