Vision-Language Tasks

behavior

AI tasks that require understanding both visual information from images and textual information together, such as describing images or answering questions about them.

Related Capabilities

Multimodal

Quality of vision, audio, and image understanding (distinct from modality support)

446