Aider Polyglot

Aider Polyglot Coding Benchmark

codingScore: 0-100 (% tasks completed)16 models scored

About

Tests AI coding assistants on real-world programming tasks across multiple languages using the Aider coding tool. Measures ability to edit existing codebases to pass tests.

Methodology

Models are asked to modify existing code across multiple programming languages to make failing tests pass. Tasks come from real open-source projects. Evaluates practical code editing ability, not just generation.

Dataset Website

Model Leaderboard

Shows open-weight models only. Commercial API models (GPT-4o, Claude, Gemini) are not submitted to the Open LLM Leaderboard — their scores come from provider-reported benchmarks.

#	Model	Score
1	GPT-5	88.0%
2	Grok 4	79.6%
3	o3	76.9%
4	Gemini 2.5 Pro	76.9%
5	o4-mini	72.0