MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei et al.|April 20, 2026arXiv

Key Takeaway

Current state-of-the-art models achieve only 69-78% on Olympiad-level math problems, and embedding models struggle to find mathematically equivalent problems—showing that both mathematical reasoning and math-aware retrieval remain open challenges for AI systems.

Summary

MathNet is a large-scale benchmark with 30,676 Olympiad-level math problems across 17 languages and 47 countries, designed to evaluate both how well AI models solve math problems and how well they retrieve similar problems. The benchmark reveals that even top models struggle with complex reasoning, and that retrieval quality significantly impacts performance in retrieval-augmented problem solving.

reasoning multimodal

Key Terms

mathematical-reasoning retrieval-augmented-generation math-aware-retrieval embedding-model