The ability to follow and maintain consistent identification of objects as they move across multiple frames in a video sequence.
Quality of vision, audio, and image understanding (distinct from modality support)