Overhead Geopose Challenge Hosted By NGA


About the project

Figure 1. Single-view geocentric pose regression predicts object heights and vector fields mapping surface features to ground level, enabling feature rectification and occlusion mapping. In this illustration, darker shades of gray have larger height values, red arrows map surface features to ground level, and occluded pixels are blue. Results are shown from the baseline implementation. Illustration is adapted from Christie et al., 2021.

Project background

This project focuses on rectifying above-ground features in oblique monocular images from overhead cameras to remove observed object parallax with respect to ground, enabling accurate object localization for Earth observation tasks including semantic mapping, map alignment, change detection, and vision-aided navigation. Current methods for these tasks focus on near-nadir images. However, for response to natural disasters and other time-critical world events, often the first available images are oblique.

An object’s geocentric pose, defined as the height above ground and orientation with respect to gravity, is a powerful representation of real-world structure for object detection, segmentation, and localization tasks. Recent works published at the Conference on Computer Vision and Pattern Recognition (CVPR) 2020 and CVPR Earthvision Workshop 2021 demonstrated the first methods to learn geocentric object pose from oblique monocular satellite images with supervision provided by airborne LiDAR. The data from those works has been publicly released, and code has been open sourced as a baseline solution to encourage further exploration of this novel task (see additional resources below).

These approaches leveraged recent advances in single-view height prediction from overhead images (shown in Figure 2) to adapt the geocentric pose representation of object geometry for remote sensing with oblique satellite images. These attributes enable rectification of above ground level (AGL) features, as shown in Figure 3.

In this challenge, we seek creative solvers to help extend this work to dramatically improve accuracy. For this challenge, we provide pixel-level object heights and image-level angles and scale factors that define vector fields mapping surface features to ground level in satellite images (shown in Figure 4). We also provide a public baseline solution described in Christie et al. (2021) and available on GitHub.

Figure 2. Single-view depth prediction methods (left) have been very successful for practical close-range computer vision tasks. For longer-range remote sensing tasks, single-view height prediction methods (right) have recently been proposed. Illustration is adapted from Mou and Zhu, 2018.

Figure 3. From left to right: (i) An RGB satellite image taken from an angle rather than overhead. (ii) RGB image transformed into geocentric pose representation. Object height is shown in grayscale, and vectors for orientation to gravity are shown in red. (iii) Rectified height of each pixel in meters based on geocentric pose. Adapted from Christie et al. “Learning Geocentric Object Pose in Oblique Monocular Images.” 2020.

Figure 4. A review of the affine geometry and simplifying assumptions for this task is shown above. The geometric projection of a local sub-image extracted from a large satellite image is well-approximated with an affine camera, which preserves invariant properties of parallelism and ratio of lengths on parallel lines. For each image in our data set, we provide pixel-level object heights and image-level angles and scale factors that define the vector fields mapping surface features to ground level. Illustration is adapted from Christie et al., 2021.

About the project team

NGA delivers world-class geospatial intelligence (GEOINT) that provides a decisive advantage to policymakers, military service members, intelligence professionals and first responders. Anyone who sails a U.S. ship, flies a U.S. aircraft, makes national policy decisions, fights wars, locates targets, responds to natural disasters, or even navigates with a cellphone relies on NGA. NGA enables all of these critical actions and shapes decisions that impact our world through the indispensable discipline of GEOINT.

JHU/APL solves complex research, engineering, and analytical problems that present critical challenges to our nation. JHU/APL—the nation’s largest university affiliated research center—provides U.S. government agencies with deep expertise in specialized fields to support national priorities and technology development programs.

Additional resources

The first published works on this task are below. The more recent Computer Vision and Pattern Recognition Workshop (CVPRW) 2021 paper provides an introduction to the task as it is posed for the challenge, a description of the baseline solution, and details about the data set.

The references above cite many related and motivating published works. Of particular interest for this challenge are the many related methods in monocular depth prediction. An especially intriguing recent method for monocular height prediction is reported in the following; however, note that for this challenge no semantic labels are provided.

An accessible introduction to single-view height prediction is provided in the following:

The SpaceNet 4 public prize challenge explored the impact of oblique imaging geometry on semantic segmentation tasks. The following paper discusses one of the motivating use cases for our challenge.

Approved for public release, 21-545