disasters

Open AI Caribbean Challenge: Mapping Disaster Risk from Aerial Imagery

Can you predict the roof material of buildings from drone imagery? Leverage aerial imagery in St. Lucia, Guatemala, and Colombia to more accurately map disaster risk at scale. #disasters

$10,000 in prizes

dec 2019

1,419 joined

Navigation

Problem description

In this challenge, you will be predicting roof type from drone imagery. The data consists of a set of overhead imagery of seven locations across three countries with labeled building footprints. Your goal is to classify each of the building footprints with the roof material type.

Features
Images
Labels
Data format

Performance metric
Example

Submission Format
Format example

The features in this dataset

The only features in this dataset are the images themselves and the building footprints in the GeoJSONs.

Images

The images consist of seven large high-resolution Cloud Optimized GeoTiffs of the seven different areas. The spatial resolution of the images is roughly 4 cm.

Colombia

Area Name	Thumbnail	Resolution	Pixel Width x Height
`borde_rural`		~ 4 cm	52318 x 31315
`borda_soacha`		~ 4.25 cm	40159 x 45650

Guatemala

Area Name	Thumbnail	Resolution	Pixel Width x Height
`mixco_1_and_ebenezer`		~ 4.3 cm	27604 x 26641
`mixco_3`		~ 3.8 cm	26066 x 19271

St. Lucia

Area Name	Resolution	Pixel Width x Height
`castries`	~ 4.5 cm	50027 x 62570
`dennery`	~ 4.2 cm	21184 x 41534
`gros_islet`	~ 3.6 cm	53492 x 90729

Note: Castries and Gros Islet contain labels from an unverified automated process. For this reason, images from Castries and Gros Islet are included only in the training dataset.

It is up to you to decide whether or not you want to utilize the potentially noisy labels from Castries and Gros Islet in your training data. These can easily be filtered out using the verified column in train_labels.csv or the verified attribute in the GeoJSON FeatureCollections. This column is True if the ground truth label is verified; for Gros Islet and Castries, it is False.

Labels

Each image¹ corresponds to train and test GeoJSONs, where labels are encoded as FeatureCollections. metadata.csv links the each image with its corresponding GeoJSON. For each area in the train set, the GeoJSON includes the unique building ID, building footprint, roof material, and verified field (see note above). For each area in the test set, the GeoJSON contains just the unique building ID and building footprint.

Roof material labels are also provided in train_labels.csv, where each row contains a unique building ID followed by five roof material columns, with a 1.0 indicating that building's roof type and 0.0s in the remaining columns. Each building has only one roof type.

Here are examples of each roof type from the Borde Soacha area.

Roof Material	Description	Count
`concrete_cement`	Roofs are made of concrete or cement.	1518
`healthy_metal`	Includes corrugated metal, galvanized sheeting, and other metal materials.	14817
`incomplete`	Under construction, extremely haphazard, or damaged.	669
`irregular_metal`	Includes metal roofing with rusting, patching, or some damage. These roofs carry a higher risk.	5241
`other`	Includes shingles, tiles, red painted, or other material.	308

Data format

A STAC (SpatioTemporal Asset Catalog)² of the imagery and label data is provided. The STAC is organized with a root catalog, containing sub-catalogs for each country. Each country contains collections for the various areas within that country.

An area collection links to STAC items — one for the imagery, one for the training label data, and if that region has test building footprints, an item for those labels. The imagery STAC item geometry is the footprint of the image. The training data label items have overviews that give the class counts for each of the roof_material classes contained in the labeled data.

Performance metric

To measure your model's accuracy by looking at prediction error, we'll use a metric called log loss. This is an error metric, so a lower value is better (as opposed to an accuracy metric, where a higher value is better). Log loss can be calculated as follows:

$$loss = -\frac{1}{N}\cdot\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{M} y_{ij}\log p_{ij}$$

where |$N$| is the number of observations, |$M$| is the number of classes (in terms of our problem |$M=5$|), |$y_{ij}$| is a binary variable indicating if classification for observation |$i$| was correct, and |$p_{ij}$| was the user-predicted probability that label |$j$| applies to observation |$i$|.

In Python you can easily calculate log loss using the scikit-learn function sklearn.metrics.log_loss. R users may find the MultiLogLoss function in the MLmetrics package.

Submission format

The format for the submission file is the building id followed by the five roof material types, with a floating point representation of the probability that each roof type applies to the building. Since you are submitting probabilities, make sure there is a decimal point in your submission. Probabilities range from 0.0 to 1.0. Remember that this is a multiclass, but not multilabel, problem.

For example, if you predicted concrete with a probability of 0.9 for the first five buildings,

id	concrete_cement	healthy_metal	incomplete	irregular_metal	other
7a4d630a	0.9	0.0	0.0	0.0	0.0
7a4bbbd6	0.9	0.0	0.0	0.0	0.0
7a4ac744	0.9	0.0	0.0	0.0	0.0
7a4881fa	0.9	0.0	0.0	0.0	0.0
7a4aa4a8	0.9	0.0	0.0	0.0	0.0

your .csv file that you submit would look like:

id,concrete_cement,healthy_metal,incomplete,irregular_metal,other
7a4d630a,0.9,0.0,0.0,0.0,0.0
7a4bbbd6,0.9,0.0,0.0,0.0,0.0
7a4ac744,0.9,0.0,0.0,0.0,0.0
7a4881fa,0.9,0.0,0.0,0.0,0.0
7a4aa4a8,0.9,0.0,0.0,0.0,0.0
⁝

Good luck!

Good luck and enjoy this problem! If you're planning to use MATLAB for your solution, be sure to request your complimentary license and check out the MATLAB starter code in the benchmark!

If you have any questions you can always visit the user forum!

Neither the Castries and Gros Islet areas have a "test-" GeoJSON file file. This is because those ground truth labels are from unverified predictions and so are not tested against. ↩
This STAC uses the 0.8 version of the spec, of which a release candidate was recently published. It also utilizes the label extension, which is still in the proposal phase. The STAC is a self-contained catalog and uses relative links. ↩

Open AI Caribbean Challenge: Mapping Disaster Risk from Aerial Imagery

Quick Facts

Participants

No. of Entries

Prize

Winner

The team