ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents

Yuxing Lu, Yushuhong Lin, Wenqi Shi, J. Ben Tamo, Xukai Zhao et al.|June 1, 2026arXiv

Key Takeaway

Current LLMs struggle with clinical decision-making not just in what they decide, but critically in how they gather information—they ask redundant questions and fail at management decisions even when they get diagnoses right, revealing a gap invisible to outcome-only evaluation.

Summary

ClinEnv is an interactive benchmark that tests how well AI models act as doctors by simulating real patient cases over multiple decision stages. Unlike static medical benchmarks, it requires models to actively gather information from specialized agents before making treatment decisions, and scores both the quality of decisions and the process of gathering information.

evaluation agents applications

Key Terms

ehr-embedded-ai-agent multi-agent-framework information-acquisition sequential-decision-making ontology-grounded-matching