Abstract
In this paper, we present a high-performing solution to the UAVM 2025 Challenge [24], which focuses on matching narrow Field-of-View (FOV) street-level images to corresponding satellite imagery using the University-1652 dataset. As panoramic Cross-View Geo-Localisation nears peak performance, it becomes increasingly important to explore more practical problem formulations. Real-world scenarios rarely offer panoramic street-level queries; instead, queries typically consist of limited-FOV images captured with unknown camera parameters. Our work prioritises discovering the highest achievable performance under these constraints, pushing the limits of existing architectures. Our method begins by retrieving candidate satellite image embeddings for a given query, followed by a re-ranking stage that selectively enhances retrieval accuracy within the top candidates. This two-stage approach enables more precise matching, even under the significant viewpoint and scale variations inherent in the task. Through experimentation, we demonstrate that our approach achieves competitive results-specifically attaining R@1 and R@10 retrieval rates of 30.21% and 63.13% respectively. This underscores the potential of optimised retrieval and re-ranking strategies in advancing practical geo-localisation performance. Code is available at github.com/tavisshore/VICI. The ground image building's distinctive multi-level structure with red/orange balconies matches the building visible in the center-left of the satellite, which has a similar tiered appearance and dark base. The grassy slope leading up to the building, the winding path, and the surrounding trees in the ground image are consistent with the landscape features around this building in the satellite image. The ground camera was likely positioned on the stairs or the path in the lower-left portion of the satellite image, looking northeast towards the building. The ground image shows a brutalist concrete building surrounded by dense trees, with a sloped area and a grate-like structure on the left. The satellite image shows a building with a similar footprint and is heavily surrounded by dense trees, matching the ground image's environment. Crucially, the ground image's sloped area with the grate-like structure on the left corresponds to the solar panels visible on the sloped ground in the bottom left of the satellite image. The ground camera was likely positioned on the pathway or sloped area in the bottom left of the satellite image, looking northeast towards the main building. Figure 1: VICI localisation example: Top left: query image, Top right: top retrieved satellite image. Bottom: justification for this satellite image being re-ranked to Top-1.