A sequence of processing steps that handles multiple types of input data (like text and images) together in a single workflow.
Quality of vision, audio, and image understanding (distinct from modality support)