Multimodal Tasks

behavior

AI tasks that require processing multiple types of input data at once, such as understanding both an image and a text question about it.

Related Capabilities

Quality of vision, audio, and image understanding (distinct from modality support)