competition
complete
\$50,000

Problem description

In this challenge, your goal is to make satellite imagery taken from a significant angle more useful for time-sensitive applications like disaster and emergency response.

This project seeks to develop an algorithm that predicts geocentric pose from single-view oblique satellite images and generalizes well to unseen world regions. Oblique images are those taken from an angle, in contrast to "nadir" images looking straight down. Geocentric pose represents object height above ground and image orientation with respect to gravity. Solutions must produce pixel-level predictions of object heights, image-level predictions of orientation angle, and image-level predictions of scale. These come together to map surface-level features to ground level.

• Performance evaluation
• Metric

Data

The data set for this challenge includes satellite images of four cities: Jacksonville, Florida, USA; Omaha, Nebraska, USA; Atlanta, Georgia, USA; and San Fernando, Argentina. There are a total of 5,923 training images and 1,025 test images.

City Abbreviation Training images Test images
San Fernando, Argentina ARG 2,325 463
Atlanta, Georgia, USA ATL 704 264
Jacksonville, Florida, USA JAX 1,098 120
Omaha, Nebraska, USA OMA 1,796 178

Scores displayed on the public leaderboard while the competition is running may not be exactly the same as the final scores on the private leaderboard, which are used to determine final prize rankings. Variation depends on how samples from the data are used for evaluation.

Note on external data: External data is not allowed in this competition. Participants can use pre-trained computer vision models as long as they were available freely and openly in that form at the start of the competition.

The data provided includes four folders:

• train: all training data files, including RGB images, above ground level (AGL) images, and JSONs with vector flow information
• test_rgbs: RGB images for the test set
• submission_format: An example submission with placeholder values that demonstrates correct submission format
• train_nano: A small subset of 100 training records (RGB image, AGL image, and JSON vector flow for each). The nano set is provided for participants to more easily experiment with data processing and modeling pipelines before running them on the full dataset.

Folders are provided as TAR archives. The benchmark blog post walks through how to extract files from a TAR archive.

Three datasets are also provided:

• metadata.csv
• geopose_train.csv
• geopose_test.csv

Metadata for the train and test set is provided in metadata.csv. Metadata includes:

• id: a randomly generated unique ID to reference each record
• city: abbreviation for the geographic location
• gsd: ground sample distance (GSD) in meters per pixel

Additional tables are provided with geocentric pose information separately for the training and test data. geopose_train.csv includes:

• id: a randomly generated unique ID to reference each record
• agl: name of the above ground level (AGL) height image file with per pixel height in cm
• json: name of the JSON file with vector flow scale and angle
• rgb: name of the RGB image file

geopose_test.csv includes only id and rgb - you will generate the AGL image and vector flow information.

RGB images, AGL images, and JSONs with vector flow for the training data are in the train folder. RGB images for the test set are in test_rgbs. The naming convention for files is:

File File type Naming format Example Provided for
RGB image JPEG 2000 [city]_[image_id]_RGB.j2k JAX_bZxjXA_RGB.j2k train & test
AGL image TIF [city]_[image_id]_AGL.tif JAX_bZxjXA_AGL.tif train only
VFLOW information JSON [city]_[image_id]_VFLOW.json JAX_bZxjXA_VFLOW.json train only

Commercial satellite imagery is provided courtesy of DigitalGlobe. Atlanta images were derived from the public SpaceNet Dataset by SpaceNet Partners, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Features

The features in this challenge are a set of 2048 x 2048 RGB images cropped from publicly available satellite images, provided courtesy of DigitalGlobe.

Each RGB image is a JPEG 2000 file (.j2k). They have been compressed from original TIF images to preserve space. Feature data also includes the city and the GSD in meters per pixel. GSD is the average pixel size in meters.

Images shown are from the public Urban Semantic 3D Dataset, provided courtesy of DigitalGlobe

Images in the dataset capture a variety of diverse landscapes, including different land uses, levels of urbanization, seasons, and imaging viewpoints.

Feature data example

id city gsd
bZxjXA JAX 0.334
RGB image (JAX_bZxjXA_RGB.j2k)

array([[[152, 146, 147],
...,
[177, 179, 182]],

[[155, 149, 151],
...,
[149, 144, 140]]], dtype=uint8)
The shape of each RGB array is (2048, 2048, 3).

Labels

An RGB satellite image taken from an angle rather than overhead (left) and the same image transformed into geocentric pose representation (right). Object height is shown in grayscale, and vectors for orientation to gravity are shown in red. Adapted from Christie et al. “Learning Geocentric Object Pose in Oblique Monocular Images.” 2020.

You’ll be asked to provide geocentric pose for each RGB image, as shown in the right image above. This includes:

1. AGL image: A 2048 x 2048 image where each pixel indicates AGL height. Train set AGLs are provided as TIF images, and height is measured in centimeters.

2. Angle: The angle (direction) of the flow vectors in the 2D image plane, which describes the image’s orientation with respect to gravity. Angle is measured in radians, starting from the negative y axis and increasing counterclockwise. Assume that each pixel has the same angle, so only one angle value is needed for each image. For example, the angle in the image below is 0.77 radians.
3. Scale: The conversion factor between vector field magnitudes (pixels) in the 2D plane of the image and object height (centimeters) in the real world. Scale is in pixels per centimeter and is based on the satellite’s imaging viewpoint. Scale is zero at true nadir. As with angle, assume each pixel in an image has the same scale.

$$\textrm{scale} = \frac{||a_2 - a_1||}{Z_2 - Z_1} = \frac{\textrm{Magnitude of the flow vector in image (pixels)}}{\textrm{Actual building height (cm)}}$$

True values for scale and angle are derived from satellite image metadata. True height AGLs are derived from Light Detection and Ranging (LiDAR), a powerful remote sensing method that uses light to measure distance to the earth’s surface.

Note: Many AGL image arrays contain missing values. These pixels represent locations where the LiDAR that was used to assess true height did not get any data. In the training AGLs, 65535 is used as a placeholder for NaNs. You do not have to predict height for pixels with missing true height values - pixels that are missing in the ground truth AGLs will be excluded from performance evaluation.

Labelled training data example

geopose_train.csv
id agl json rgb
bZxjXA JAX_bZxjXA_AGL.tif JAX_bZxjXA_VFLOW.json JAX_bZxjXA_RGB.j2k
AGL image (JAX_bZxjXA_AGL.tif)

array([[8, 8, 6, ..., 0, 0, 0],
[20, 18, 4, ..., 0, 0, 0]], dtype=uint16)

The shape of the AGL array is (2048, 2048). AGLs show pixel height in cm and have data type uint16 - see the submission format section for more details.

Vector flow JSON (JAX_bZxjXA_VFLOW.json)

{"scale": 0.01021532, "angle": 0.771909}
Scale is in pixels/cm. Angle is in radians.

Performance evaluation

Submissions will be evaluated using the coefficient of determination R2, which is a form of squared error normalized by the value range.

$$R^2 = 1 - \frac{\textrm{residual sum of squared errors}}{\text{total sum of squared errors}} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y_i})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

• $n$ = number of values in the dataset
• $y_i$ = $i$th true value
• $\hat{y_i}$ = $i$th predicted value
• $\bar{y}$ = average of all true $y$ values

Test locations have rural, suburban, and urban scenes, each with different value ranges for object heights and their corresponding flow vectors. For leaderboard evaluation, R2 for heights and flow vectors will be assessed for each geographic location independently and then averaged to produce a final score.

Submission format

The submission file for this competition consists of geocentric pose information (AGL with pixel height, vector flow angle, and vector flow scale) for each image. See the benchmark blog post for a step-by-step walkthrough of how to save your predictions in the correct submission format. For each test set RGB image, you'll need to submit:

1. Above ground level (AGL) image

A 2048 x 2048 .tif file with height predictions. The name of the AGL file should be <city_abbreviation>_<image_id>_AGL.tif. AGLs should show height in centimeters and have data type uint16. To make the size of participant submissions manageable, your AGL images should be saved using a lossless TIFF compression. In the benchmark, we compress each AGL TIFF by passing tiff_adobe_deflate as the compression argument to the Image.save() function from the Pillow library.

2. Vector flow

A JSON file with vector flow information. The name of the JSON file should be <city_abbreviation>_<image_id>_VFLOW.json. Example JSON file:

{"scale": 0.010215321926341547, "angle": 0.7719090975770877}


Scale is in pixels/cm. Angle is in radians, starting at 0 from the negative y axis and increasing counterclockwise.

Naming conventions for submission files:

File type Naming format Example
AGL <city>_<image_id>_AGL.tif JAX_bZxjXA_RGB.j2k -> JAX_bZxjXA_AGL.tif
JSON <city>_<image_id>_VFLOW.json JAX_bZxjXA_RGB.j2k -> JAX_bZxjXA_VFLOW.json

All of the submission files should be compressed to one .tar.gz file. Your tar.gz file for submission should be around 1.6 GB. Large tar.gz files will be rejected.

Model write-up bonus

In addition to getting the best possible predictions for rectified images, the project team is interested in identifying interesting, innovative ideas among modeling approaches. These ideas may be useful for assembling the results of the challenge for journal article submission.

Contributions of particular interest to consider for the write-up include:

• Sharing insights regarding observed biases in the data and methods to enable generalization
• Describing techniques for identifying failure cases and methods to address them
• Identifying state of the art learning methods that can be successfully applied to our task
• Documenting any other lessons learned or insights

The top 15 finalists on the private leaderboard will have the opportunity to submit a write-up of their solution using the template provided on the data download page.

Evaluation

Bonus prizes will be awarded to the top 3 write-ups selected by a panel of judges, composed of domain experts from NGA and JHU/APL. The judging panel will evaluate each report based on the following criteria:

• Rigor (40%): To what extent is the write-up built on sound, sophisticated quantitative analysis and a performant statistical model?
• Innovation (40%): How useful are the contents of the write-up in expanding beyond well-established methods or using them in novel ways to tackle the challenge?
• Clarity (20%): How clearly are the solution concepts, processes, and results communicated and visualized?

Note: The judging will be done primarily on a technical basis rather than on language, since many participants may not be native English speakers.

Submission format

Model write-ups will be coordinated by email for eligible finalists from the Prediction Contest.

Write-ups must be no more than 8 pages and adhere to the format requirements listed in the provided template. A sample write-up is provided for the baseline solution.

Good luck!

If you have any questions you can always visit the user forum. Good luck and enjoy the challenge!

Approved for public release, 21-545