LLMs may appear competent on multiple-choice MRI benchmarks but struggle significantly with free-text recall of vendor-specific operational knowledge; multiple-choice scores alone don't indicate readiness for real-world MRI protocol guidance.
This paper introduces MRI-Eval, a benchmark with 1,365 questions testing LLM knowledge of MRI physics and GE scanner operations across three difficulty levels.