Navigation

Problem description

In this challenge, your goal is to make satellite imagery taken from a significant angle more useful for time-sensitive applications like disaster and emergency response.

This project seeks to develop an algorithm that predicts geocentric pose from single-view oblique satellite images and generalizes well to unseen world regions. Oblique images are those taken from an angle, in contrast to "nadir" images looking straight down. Geocentric pose represents object height above ground and image orientation with respect to gravity. Solutions must produce pixel-level predictions of object heights, image-level predictions of orientation angle, and image-level predictions of scale. These come together to map surface-level features to ground level.

Data
Features
Labels

Performance evaluation
Metric

Submission format
Format example

Model write-up bonus
Evaluation
Submission format

Data

The data set for this challenge includes satellite images of four cities: Jacksonville, Florida, USA; Omaha, Nebraska, USA; Atlanta, Georgia, USA; and San Fernando, Argentina. There are a total of 5,923 training images and 1,025 test images.

City	Abbreviation	Training images	Test images
San Fernando, Argentina	ARG	2,325	463
Atlanta, Georgia, USA	ATL	704	264
Jacksonville, Florida, USA	JAX	1,098	120
Omaha, Nebraska, USA	OMA	1,796	178

Scores displayed on the public leaderboard while the competition is running may not be exactly the same as the final scores on the private leaderboard, which are used to determine final prize rankings. Variation depends on how samples from the data are used for evaluation.

Note on external data: External data is not allowed in this competition. Participants can use pre-trained computer vision models as long as they were available freely and openly in that form at the start of the competition.

The data provided includes four folders:

train: all training data files, including RGB images, above ground level (AGL) images, and JSONs with vector flow information
test_rgbs: RGB images for the test set
submission_format: An example submission with placeholder values that demonstrates correct submission format
train_nano: A small subset of 100 training records (RGB image, AGL image, and JSON vector flow for each). The nano set is provided for participants to more easily experiment with data processing and modeling pipelines before running them on the full dataset.

Folders are provided as TAR archives. The benchmark blog post walks through how to extract files from a TAR archive.

Three datasets are also provided:

metadata.csv
geopose_train.csv
geopose_test.csv

Metadata for the train and test set is provided in metadata.csv. Metadata includes:

id: a randomly generated unique ID to reference each record
city: abbreviation for the geographic location
gsd: ground sample distance (GSD) in meters per pixel

Additional tables are provided with geocentric pose information separately for the training and test data. geopose_train.csv includes:

id: a randomly generated unique ID to reference each record
agl: name of the above ground level (AGL) height image file with per pixel height in cm
json: name of the JSON file with vector flow scale and angle
rgb: name of the RGB image file

geopose_test.csv includes only id and rgb - you will generate the AGL image and vector flow information.

RGB images, AGL images, and JSONs with vector flow for the training data are in the train folder. RGB images for the test set are in test_rgbs. The naming convention for files is:

File	File type	Naming format	Example	Provided for
RGB image	JPEG 2000	[city]_[image_id]_RGB.j2k	JAX_bZxjXA_RGB.j2k	train & test
AGL image	TIF	[city]_[image_id]_AGL.tif	JAX_bZxjXA_AGL.tif	train only
VFLOW information	JSON	[city]_[image_id]_VFLOW.json	JAX_bZxjXA_VFLOW.json	train only

Commercial satellite imagery is provided courtesy of DigitalGlobe. Atlanta images were derived from the public SpaceNet Dataset by SpaceNet Partners, licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Features

The features in this challenge are a set of 2048 x 2048 RGB images cropped from publicly available satellite images, provided courtesy of DigitalGlobe.

Each RGB image is a JPEG 2000 file (.j2k). They have been compressed from original TIF images to preserve space. Feature data also includes the city and the GSD in meters per pixel. GSD is the average pixel size in meters.

Images shown are from the public Urban Semantic 3D Dataset, provided courtesy of DigitalGlobe

Images in the dataset capture a variety of diverse landscapes, including different land uses, levels of urbanization, seasons, and imaging viewpoints.

Feature data example

Metadata

id	city	gsd
bZxjXA	JAX	0.334

RGB image (JAX_bZxjXA_RGB.j2k)

examples RGB image

array([[[152, 146, 147],
        ...,
        [177, 179, 182]],

       [[155, 149, 151],
        ...,
        [149, 144, 140]]], dtype=uint8)

The shape of each RGB array is (2048, 2048, 3).

Labels

RGB to geocentric pose

An RGB satellite image taken from an angle rather than overhead (left) and the same image transformed into geocentric pose representation (right). Object height is shown in grayscale, and vectors for orientation to gravity are shown in red. Adapted from Christie et al. “Learning Geocentric Object Pose in Oblique Monocular Images.” 2020.

You’ll be asked to provide geocentric pose for each RGB image, as shown in the right image above. This includes:

AGL image: A 2048 x 2048 image where each pixel indicates AGL height. Train set AGLs are provided as TIF images, and height is measured in centimeters.
Angle: The angle (direction) of the flow vectors in the 2D image plane, which describes the image’s orientation with respect to gravity. Angle is measured in radians, starting from the negative y axis and increasing counterclockwise. Assume that each pixel has the same angle, so only one angle value is needed for each image. For example, the angle in the image below is 0.77 radians.
Scale: The conversion factor between vector field magnitudes (pixels) in the 2D plane of the image and object height (centimeters) in the real world. Scale is in pixels per centimeter and is based on the satellite’s imaging viewpoint. Scale is zero at true nadir. As with angle, assume each pixel in an image has the same scale.

$$ \textrm{scale} = \frac{||a_2 - a_1||}{Z_2 - Z_1} = \frac{\textrm{Magnitude of the flow vector in image (pixels)}}{\textrm{Actual building height (cm)}} $$

True values for scale and angle are derived from satellite image metadata. True height AGLs are derived from Light Detection and Ranging (LiDAR), a powerful remote sensing method that uses light to measure distance to the earth’s surface.

Note: Many AGL image arrays contain missing values. These pixels represent locations where the LiDAR that was used to assess true height did not get any data. In the training AGLs, 65535 is used as a placeholder for NaNs. You do not have to predict height for pixels with missing true height values - pixels that are missing in the ground truth AGLs will be excluded from performance evaluation.

Labelled training data example

geopose_train.csv

id	agl	json	rgb
bZxjXA	JAX_bZxjXA_AGL.tif	JAX_bZxjXA_VFLOW.json	JAX_bZxjXA_RGB.j2k

AGL image (JAX_bZxjXA_AGL.tif)

array([[8, 8, 6, ..., 0, 0, 0],
       [20, 18, 4, ..., 0, 0, 0]], dtype=uint16)

The shape of the AGL array is (2048, 2048). AGLs show pixel height in cm and have data type uint16 - see the submission format section for more details.

Vector flow JSON (JAX_bZxjXA_VFLOW.json)

{"scale": 0.01021532, "angle": 0.771909}

Scale is in pixels/cm. Angle is in radians.

Performance evaluation

Submissions will be evaluated using the coefficient of determination R², which is a form of squared error normalized by the value range.

$$ R^2 = 1 - \frac{\textrm{residual sum of squared errors}}{\text{total sum of squared errors}} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y_i})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} $$

|$n$| = number of values in the dataset
|$y_i$| = |$i$|th true value
|$\hat{y_i}$| = |$i$|th predicted value
|$\bar{y}$| = average of all true |$y$| values

Test locations have rural, suburban, and urban scenes, each with different value ranges for object heights and their corresponding flow vectors. For leaderboard evaluation, R² for heights and flow vectors will be assessed for each geographic location independently and then averaged to produce a final score.

Submission format

The submission file for this competition consists of geocentric pose information (AGL with pixel height, vector flow angle, and vector flow scale) for each image. See the benchmark blog post for a step-by-step walkthrough of how to save your predictions in the correct submission format. For each test set RGB image, you'll need to submit:

1. Above ground level (AGL) image

A 2048 x 2048 .tif file with height predictions. The name of the AGL file should be <city_abbreviation>_<image_id>_AGL.tif. AGLs should show height in centimeters and have data type uint16. To make the size of participant submissions manageable, your AGL images should be saved using a lossless TIFF compression. In the benchmark, we compress each AGL TIFF by passing tiff_adobe_deflate as the compression argument to the Image.save() function from the Pillow library.

2. Vector flow

A JSON file with vector flow information. The name of the JSON file should be <city_abbreviation>_<image_id>_VFLOW.json. Example JSON file:

{"scale": 0.010215321926341547, "angle": 0.7719090975770877}

Scale is in pixels/cm. Angle is in radians, starting at 0 from the negative y axis and increasing counterclockwise.

Naming conventions for submission files:

File type	Naming format	Example
AGL	`<city>_<image_id>_AGL.tif`	`JAX_bZxjXA_RGB.j2k` -> `JAX_bZxjXA_AGL.tif`
JSON	`<city>_<image_id>_VFLOW.json`	`JAX_bZxjXA_RGB.j2k` -> `JAX_bZxjXA_VFLOW.json`

All of the submission files should be compressed to one .tar.gz file. Your tar.gz file for submission should be around 1.6 GB. Large tar.gz files will be rejected.

Model write-up bonus

In addition to getting the best possible predictions for rectified images, the project team is interested in identifying interesting, innovative ideas among modeling approaches. These ideas may be useful for assembling the results of the challenge for journal article submission.

Contributions of particular interest to consider for the write-up include:

Sharing insights regarding observed biases in the data and methods to enable generalization
Describing techniques for identifying failure cases and methods to address them
Identifying state of the art learning methods that can be successfully applied to our task
Documenting any other lessons learned or insights

The top 15 finalists on the private leaderboard will have the opportunity to submit a write-up of their solution using the template provided on the data download page.

Evaluation

Bonus prizes will be awarded to the top 3 write-ups selected by a panel of judges, composed of domain experts from NGA and JHU/APL. The judging panel will evaluate each report based on the following criteria:

Rigor (40%): To what extent is the write-up built on sound, sophisticated quantitative analysis and a performant statistical model?
Innovation (40%): How useful are the contents of the write-up in expanding beyond well-established methods or using them in novel ways to tackle the challenge?
Clarity (20%): How clearly are the solution concepts, processes, and results communicated and visualized?

Note: The judging will be done primarily on a technical basis rather than on language, since many participants may not be native English speakers.

Submission format

Model write-ups will be coordinated by email for eligible finalists from the Prediction Contest.

Write-ups must be no more than 8 pages and adhere to the format requirements listed in the provided template. A sample write-up is provided for the baseline solution.

Good luck!

If you have any questions you can always visit the user forum. Good luck and enjoy the challenge!

Approved for public release, 21-545

Overhead Geopose Challenge

Quick Facts

Participants

No. of Entries

Prize

Winner

selim_sef

Navigation

Problem description

Data

Features

Feature data example

Labels

Labelled training data example

Performance evaluation

Submission format

Model write-up bonus

Evaluation

Submission format

Good luck!

On this page