Aider Polyglot Coding Benchmark
Tests AI coding assistants on real-world programming tasks across multiple languages using the Aider coding tool. Measures ability to edit existing codebases to pass tests.
Models are asked to modify existing code across multiple programming languages to make failing tests pass. Tasks come from real open-source projects. Evaluates practical code editing ability, not just generation.
No model scores recorded yet
Scores will appear here as the pipeline processes model data