Overhead Geopose Challenge

Help make overhead imagery more useful for time-sensitive applications like disaster response. Build computer vision algorithms that can effectively model the height and pose of ground objects for monocular satellite images taken from oblique angles. #science

$50,000 in prizes
Completed jul 2021
438 joined

Urban Semantic 3D Data

Images shown are from the public Urban Semantic 3D Dataset, provided courtesy of DigitalGlobe


We rely on 3D maps for urban planning, emergency response planning, and navigation; however, our current ability to very quickly update these maps after dynamic world events is limited. The level of detail achieved by the top models from the competition exceeded our expectations. We are excited to apply the methods to increase the currency and fidelity of existing 3D models.

— Monte Turner, NGA Research Foundational GEOINT Office Director

Why

Overhead satellite imagery provides critical time-sensitive information for use in arenas such as disaster response, navigation, and security. Most current methods for using aerial imagery assume images are taken from directly overhead, known as near-nadir. However, the first images available are often taken from an angle — they are oblique. Effects from these camera orientations complicate useful tasks such as change detection, vision-aided navigation, and map alignment.

The Solution

To address this need, the National Geospatial-Intelligence Agency (NGA) and Johns Hopkins University Applied Physics Lab partnered with the National Aeronautics and Space Administration and DrivenData to run a machine learning challenge aimed at making satellite imagery taken from a significant angle more useful for time-sensitive applications.

Participants competed to develop the best computer vision models for inferring the geocentric pose of ground objects from oblique satellite images. Geocentric pose is an object’s height above the ground and its orientation with respect to gravity. Recent works published at the Conference on Computer Vision and Pattern Recognition (CVPR) 2020 and CVPR Earthvision Workshop 2021 demonstrated the first methods to learn geocentric pose from oblique monocular satellite images, with ground truth labels provided by airborne lidar. The data and benchmark model were used as the foundation for improvements in this competition.

The Results

Over the course of challenge, we received more than 750 submissions from participants around the world. Submissions were evaluated using R-squared, which measures how much of the variance in the ground truth data is captured by the model. The results were averaged across the four cities in the data. The geography of San Fernando, Argentina was particularly tricky for this task because the city has fewer tall buildings and more architectural diversity.

RMSE graph

The winning solution was able to surpass a 0.9 R-squared, an impressive increase of more than 10 percentage points beyond the previous benchmark (0.7988)! All four of the top participants were also able to increase the R-squared for San Fernando by an even greater margin. Overall, these participants were able to drastically improve on the model's ability to identify small vertical features, capture object outlines with higher resolution, and correct for the most oblique viewpoints.

All the prize-winning solutions from this competition, including detailed reports, have been made available on Github for anyone to use and learn from.


RESULTS ANNOUNCEMENT + MEET THE WINNERS

WINNING MODELS ON GITHUB

URBAN SEMANTIC 3D DATASET


Approved for public release, 21-943