A type of input or output data a model can process, such as text, images, or audio.
Quality of vision, audio, and image understanding (distinct from modality support)