AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

Jisong Cai, Long Ling, Shiwei Chu, Zhongshan Liu, Jiayue Kang et al.|June 8, 2026arXiv

Key Takeaway

Decoupling world prediction and action execution into asynchronous temporal streams—where the world model runs slowly and the action model runs fast—improves both robot control performance and computational efficiency without requiring robot pretraining data.

Summary

This paper presents AHA-WAM, a robot control system that separates world prediction from action execution at different speeds. A slow video model learns long-term scene patterns while a fast action model executes short movements by reusing the video model's learned context, enabling responsive closed-loop control without redundant computation.

efficiency

Key Terms

world-model diffusion-transformer closed-loop-control key-value-caches context-routing