Hierarchical planning with multi-scale world models enables robots to handle long-horizon tasks with 4x less compute and works zero-shot in new environments—a practical win for embodied AI systems.
This paper tackles long-horizon robot control by learning world models at multiple time scales and planning hierarchically across them. Instead of predicting every single step far into the future (which accumulates errors), the approach learns coarse and fine-grained models and plans at both levels, reducing computation while improving success on real-world tasks like pick-and-place.