Combining explicit 3D geometry (camera poses, spatial relationships) with visual matching dramatically improves cross-view localization and enables zero-shot transfer between ground and drone views without paired training data.
This paper tackles cross-view object geo-localization—finding a target object in satellite imagery when given a ground or drone photo. The authors introduce a large dataset with 220K+ image pairs and geometric metadata, plus GAGeo, a unified framework that predicts object locations, masks, and camera poses simultaneously using 3D spatial understanding rather than just appearance matching.