American Invitational Mathematics Examination 2024
15 challenging math competition problems from AIME 2024, used as a difficult math reasoning benchmark for frontier models
15 problems requiring significant mathematical insight and multi-step reasoning. Answers are integers from 000 to 999. Problems span algebra, geometry, number theory, and combinatorics.
Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.
| # | Model | Score |
|---|---|---|
| 1 | o4-mini | 93.4% |
| 2 | Grok 3 | 93.3% |
| 3 | Gemini 2.5 Pro | 92.0% |
| 4 | o3 | 91.6% |
| 5 | Gemini 2.5 Flash | 88.0% |
| 6 | o3 Mini |
| 87.3% |
| 7 | Qwen3 235B A22B | 85.7% |
| 8 | Claude 3.7 Sonnet | 80.0% |
| 9 | DeepSeek R1 | 79.8% |
| 10 | o1 | 74.3% |
| 11 | DeepSeek V3 | 39.2% |
| 12 | GPT-4o | 12.0% |