Visual-Language Model

architecture

A model that processes both images and text together, understanding the relationship between visual content and language to answer questions about images or describe what it sees.

Related Capabilities

Multimodal

Quality of vision, audio, and image understanding (distinct from modality support)

424