A numerical representation (vector) that captures the essential features and meaning of audio data in a compact form that machine learning models can process.
Quality of vision, audio, and image understanding (distinct from modality support)