LLMs deployed for medical advice have hidden, consistent ethical biases that don't reflect real physician diversity; without explicit auditing and balancing, a single model's values could be imposed at scale to thousands of patients.
This paper audits how large language models handle ethical dilemmas in medicine, revealing that while models discuss multiple ethical perspectives in their reasoning, they make near-identical decisions across repeated attempts.