MRI-Eval: A Tiered Benchmark for Evaluating LLM Performance on MRI Physics and GE Scanner Operations Knowledge

Perry E. Radau|May 6, 2026arXiv

Key Takeaway

LLMs may appear competent on multiple-choice MRI benchmarks but struggle significantly with free-text recall of vendor-specific operational knowledge; multiple-choice scores alone don't indicate readiness for real-world MRI protocol guidance.

Summary

This paper introduces MRI-Eval, a benchmark with 1,365 questions testing LLM knowledge of MRI physics and GE scanner operations across three difficulty levels.

evaluation applications

Key Terms

benchmark multiple-choice-question free-text-generation domain-specific-knowledge vendor-specific-operations