Geometric Action Model for Robot Policy Learning

Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An et al.|June 15, 2026arXiv

Key Takeaway

Using 3D geometric reasoning as a shared foundation for both world prediction and action generation makes robot policies more accurate and efficient than 2D-based approaches, while requiring fewer parameters than large foundation models.

Summary

This paper introduces Geometric Action Model (GAM), a robot control system that uses a pretrained 3D geometry foundation model to understand both the physical world and predict robot actions.

architecture multimodal

Key Terms

geometric-foundation-model vision-language-action-model contact-rich-manipulation world-action-model latent-token-prediction