A capable generalist that handles reasoning, coding, and instruction-following with solid reliability. At 70B parameters, it sits in a practical middle ground — more capable than smaller models but runnable on high-end consumer or modest server hardware. Follows instructions faithfully and produces structured outputs cleanly, though it can occasionally lack the nuance of larger frontier models on complex multi-step tasks.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| ARC-Challenge | 93.3 | accuracy | 5d ago |
| WinoGrande | 85.3 | accuracy | 5d ago |
| HellaSwag | 86.8 | accuracy | 5d ago |
| GSM8K | 94.9 | accuracy | 5d ago |
| TruthfulQA | 60.7 | accuracy | 5d ago |
| SciCode | 0.0 | main_problem_pass@1 | 5d ago |