Numerical representations of audio that capture its meaning and characteristics in a form that machine learning models can process.
Quality of vision, audio, and image understanding (distinct from modality support)