Agentic Tasks
Multi-step tasks requiring planning, tool use, and self-correction
Sort:
Scores indicate use-case fit (1–5). Models with the same score perform comparably — order within a score level is not a ranking.
Multi-step tasks requiring planning, tool use, and self-correction
Scores indicate use-case fit (1–5). Models with the same score perform comparably — order within a score level is not a ranking.