Current state-of-the-art models achieve only 69-78% on Olympiad-level math problems, and embedding models struggle to find mathematically equivalent problems—showing that both mathematical reasoning and math-aware retrieval remain open challenges for AI systems.
MathNet is a large-scale benchmark with 30,676 Olympiad-level math problems across 17 languages and 47 countries, designed to evaluate both how well AI models solve math problems and how well they retrieve similar problems. The benchmark reveals that even top models struggle with complex reasoning, and that retrieval quality significantly impacts performance in retrieval-augmented problem solving.