SciCode

codingScore: 0-100 (% correct)12 models scored

About

Tests models on research-level scientific programming problems drawn from real scientific papers across physics, chemistry, biology, and mathematics

Methodology

Problems require implementing algorithms described in scientific literature, then verifying correctness against test cases. Covers numerical methods, simulations, and domain-specific computations across STEM fields.

Paper Dataset Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	o3 Mini	10.8%
2	DeepSeek R1	4.6%
3	Claude 3.5 Sonnet	4.6%
4	DeepSeek V3	3.1%
5	Llama 3.1 405B Instruct