Just Zoom In: Cross-View Geo-Localization via Autoregressive Zooming

Yunus Talha Erzurumlu, Jiyong Kwag, Alper Yilmaz|March 26, 2026arXiv

Key Takeaway

Treating geo-localization as a sequential zooming problem over maps, rather than image retrieval, achieves better results and avoids the limitations of contrastive learning approaches that struggle with landmark visibility mismatches.

Summary

This paper tackles cross-view geo-localization—matching street-view photos to satellite maps to pinpoint a camera's location without GPS. Instead of the standard approach of comparing images in a shared embedding space, the authors propose a new method that zooms progressively into a satellite map, making sequential decisions to narrow down the location.

reasoning architecture evaluation

Key Terms

cross-view-matching coarse-to-fine-reasoning autoregressive-zooming contrastive-retrieval