GPT-4 Turbo is a workhorse that handles complex reasoning, long documents, and nuanced instruction-following with consistent reliability. It has a 128k context window, making it comfortable with large codebases or lengthy documents in a single pass. It can occasionally over-explain or hedge, but its breadth across coding, analysis, and writing tasks makes it a steady, dependable presence.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| HellaSwag | 95.3 | accuracy | 5d ago |
| WinoGrande | 87.5 | accuracy | 5d ago |
| TruthfulQA | 59.0 | accuracy | 5d ago |
| SciCode | 1.5 | main_problem_pass@1 | 5d ago |
| GSM8K | 97.0 | accuracy | 5d ago |
| ARC-Challenge | 96.3 | accuracy | 5d ago |