Echo-Memory: A Controlled Study of Memory in Action World Models

Wayne King, Zeyue Xue, Yuxuan Bian, Jie Huang, Haoran Li et al.|June 8, 2026arXiv

Key Takeaway

When building video world models, memory capacity matters more than compression, and the structure of how memory is accessed (like state-space recurrence) is as important as whether you use memory at all.

Summary

This paper systematically compares different memory mechanisms in video generation models that create multi-segment videos from text and camera actions.

architecture evaluation

Key Terms

world-model state-space-models memory-mechanism diffusion-process