Qwen2.5 14B Instruct sits in a practical middle ground — large enough to handle nuanced reasoning and multilingual tasks, compact enough to run on consumer hardware. It follows instructions reliably and handles structured outputs like JSON or code with reasonable precision. Its Chinese-English bilingual capabilities are notably strong, reflecting Alibaba's training priorities, though it can occasionally be verbose when brevity is called for.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| MATH | 54.8 | accuracy | 29d ago |
| GPQA Diamond | 9.6 | accuracy | 29d ago |
| BBH | 48.4 | accuracy | 29d ago |
| MuSR | 10.2 | accuracy | 29d ago |
| IFEval | 81.6 | accuracy | 29d ago |
| MMLU-Pro | 43.4 | accuracy | 29d ago |