By training models to handle multiple parallel computation streams instead of sequential message exchanges, you can build faster, more responsive AI agents that can act while thinking and react to new information without waiting for previous operations to complete.
This paper proposes Multi-Stream LLMs, which replace the single sequential message stream in current language models with multiple parallel streams for inputs, outputs, and reasoning. This allows models to read and write simultaneously, think while acting, and process different types of information in parallel—addressing fundamental bottlenecks in how AI agents currently operate.