A model's ability to understand and reason about visual information in images, connecting what it sees to language and concepts.
Quality of vision, audio, and image understanding (distinct from modality support)
Multi-step reasoning, logic puzzles, mathematical problem-solving