Claude 3.5 Sonnet operates like a sharp generalist who rarely needs to be asked twice — it follows nuanced instructions carefully, writes with clarity and natural tone, and handles complex reasoning without losing the thread. It sits in a practical middle ground: more capable than lighter models on multi-step tasks, without the latency of heavier ones. Occasionally cautious in sensitive areas, but consistent and reliable across a wide range of everyday tasks.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| Aider Polyglot | 51.6 | accuracy | 5d ago |
| GSM8K | 96.4 | accuracy | 5d ago |
| SWE-Bench | 49.0 | accuracy | 5d ago |
| SciCode | 4.6 | main_problem_pass@1 | 5d ago |
| TAU2 | 46.0 | accuracy | 5d ago |