A model's ability to perform well on new, unseen data that differs from what it was trained on.
Adhering to complex, structured, or constrained instructions
Multi-step reasoning, logic puzzles, mathematical problem-solving