Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Sarath Shekkizhar, Romain Cosentino, Adam Earle|April 2, 2026arXiv

Key Takeaway

Task accuracy and conversational awareness are separate capabilities—a model can answer questions correctly without understanding how users naturally respond to those answers, revealing a blind spot in current LLM evaluation.

Summary

This paper reveals that language models can solve tasks correctly without understanding how conversations should naturally continue. Researchers tested this by asking models to generate the next user message after an assistant response—a task that requires understanding interaction flow.

evaluation reasoning

Key Terms

interaction-awareness user-turn-generation task-accuracy