Medical AI hallucinations have different sources (visual, knowledge, reasoning); diagnosing which stage fails helps you fix the right problem and improve trustworthiness.
ClinHallu is a benchmark with 7,031 medical cases that diagnoses where hallucinations occur in medical AI systems—whether from misreading images, recalling wrong medical facts, or flawed reasoning. It includes detailed reasoning traces and shows that training on these traces reduces errors.