Grok 3 has a reputation for directness and a willingness to engage with edgy or unconventional questions that other models tend to sidestep. It leans into reasoning-heavy tasks and handles complex analytical problems with notable depth. The trade-off is that its personality can feel more unfiltered than polished, which suits some workflows and jars others.
| Benchmark | Score | Type | Recorded |
|---|---|---|---|
| AIME 2024 | 93.3 | accuracy | 5d ago |
| AIME 2025 | 86.7 | accuracy | 5d ago |
| Aider Polyglot | 53.3 | accuracy | 5d ago |