LiveCodeBench

codingScore: 0-100 (% pass@1)3 models scored

About

Continuously updated coding benchmark using new competitive programming problems from LeetCode, AtCoder, and Codeforces to prevent contamination

Methodology

Collects new competitive programming problems published after training cutoff dates of evaluated models. Problems include code generation, self-repair, code execution prediction, and test output prediction. Automatically refreshed to avoid benchmark contamination.

Paper Dataset Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	Qwen3 235B A22B	70.7%
2	Gemini 2.5 Pro	70.4%
3	DeepSeek R1	65.9%