In clinical AI, safety requires deliberate design choices around evidence quality and retrieval strategy, not just model scaling. A few high-risk errors matter more than average performance.
This paper shows that making clinical AI models bigger or faster doesn't automatically make them safer—safety and accuracy follow different rules. Researchers tested 34 medical AI models and found that high-quality evidence dramatically improved both accuracy and safety, but standard retrieval methods and extra computing power didn't prevent dangerous errors or overconfidence.