A model component that processes video frames and converts them into compact numerical representations that capture the video's visual and motion content.