Medical VLMs need explicit training on input validation (checking modality, anatomy, orientation) as a separate safety step before diagnosis, not as an afterthought—current models hallucinate plausible reports even on obviously invalid inputs.
This paper reveals a critical blind spot in medical AI: vision-language models can generate fluent medical reports even when given invalid inputs like wrong body parts or upside-down images. MedObvious is a benchmark of 1,880 tasks testing whether models can catch these basic sanity checks before attempting diagnosis—a step human radiologists do automatically but VLMs currently fail at.