Role-based conditioning significantly improves when voice agents decide to speak in group conversations—a critical capability for real-time multi-party voice applications like meeting assistants and collaborative AI systems.
This paper presents ModeratorLM, a voice agent that improves turn-taking in multi-party conversations by conditioning behavior on an explicitly assigned role (e.g., moderator, participant). The system uses a speech language model with streaming processing and optional chain-of-thought reasoning.