AIME 2025

American Invitational Mathematics Examination 2025

mathScore: 0-100 (% correct)14 models scored

About

15 challenging math competition problems from AIME 2025, used as a difficult math reasoning benchmark for frontier models

Methodology

15 problems requiring significant mathematical insight and multi-step reasoning. Answers are integers from 000 to 999. Problems span algebra, geometry, number theory, and combinatorics. Used by labs as a quick-to-evaluate math reasoning probe for frontier models.

Dataset Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	MAI-Thinking-1	97.0%
2	GPT-5	94.6%
3	o4-mini	92.7%
4	Grok 4	91.7%
5	o3	88.9