Video MLLMs can be dramatically more efficient by encoding frames predictively: send full frames only when needed, use compact change descriptions otherwise. This cuts token usage to 1/7 while improving accuracy and reducing latency from 9+ seconds to 1.6 seconds.
AdaCodec is a new way to encode videos for AI models that intelligently decides when to send full frames versus compact descriptions of changes.