Numerical representations of audio that capture the meaningful features of speech in a compact form, useful for tasks like speaker identification or speech similarity.
Quality of vision, audio, and image understanding (distinct from modality support)