American Invitational Mathematics Examination 2025
15 challenging math competition problems from AIME 2025, used as a difficult math reasoning benchmark for frontier models
15 problems requiring significant mathematical insight and multi-step reasoning. Answers are integers from 000 to 999. Problems span algebra, geometry, number theory, and combinatorics. Used by labs as a quick-to-evaluate math reasoning probe for frontier models.
Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.
| # | Model | Score |
|---|---|---|
| 1 | GPT-5 | 94.6% |
| 2 | o4-mini | 92.7% |
| 3 | Grok 4 | 91.7% |
| 4 | o3 | 88.9% |
| 5 | DeepSeek R1 | 87.5% |
| 6 | Grok 3 | 86.7% |
| 7 | Gemini 2.5 Pro | 86.7% |
| 8 | o3 Mini | 86.5% |
| 9 | Qwen3 235B A22B | 81.5% |
| 10 | o1 | 79.2% |
| 11 | Claude Opus 4 | 75.5% |
| 12 | Gemini 2.5 Flash | 72.0% |
| 13 | Claude Sonnet 4 | 70.5% |