The ability of an AI model to process and reason about multiple types of input data (like images and text) simultaneously.
Quality of vision, audio, and image understanding (distinct from modality support)