Active perception—where an AI agent decides what to observe rather than passively processing everything—enables better video understanding with lower computational cost and improves with more reasoning steps.
OmniAgent is an AI agent that watches videos intelligently by deciding what to pay attention to, rather than processing every frame. It combines audio, visual, and text understanding in a reasoning loop, building up memory of important moments. This approach scales better with video length and achieves top performance on video understanding benchmarks.