Using 3D geometric reasoning as a shared foundation for both world prediction and action generation makes robot policies more accurate and efficient than 2D-based approaches, while requiring fewer parameters than large foundation models.
This paper introduces Geometric Action Model (GAM), a robot control system that uses a pretrained 3D geometry foundation model to understand both the physical world and predict robot actions.