Giving AI agents control over their visual perception—deciding what to look at and when—significantly improves video reasoning accuracy. This active observation approach works as a plug-and-play upgrade for existing vision-language models.
LensWalk is an AI framework that lets language models actively control how they watch videos while reasoning about them.