Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience

Can you map building footprints from drone imagery? This semantic segmentation challenge leverages computer vision and data from OpenStreetMap to support disaster risk management efforts in cities across Africa. #disasters

$15,000 in prizes
mar 2020
1,099 joined

Problem description

In this challenge, you will be segmenting building footprints from aerial imagery. The data consists of drone imagery from 10 different cities and regions across Africa. Your goal is to classify the presence or absence of a building on a pixel-by-pixel basis.

The features in this dataset

The features in this dataset are the images themselves and the building footprints in the GeoJSONs, which can be used to train a building segmentation model. All training data (with the exception of the labels for images of Zanzibar) are pulled from OpenStreetMap.


The images are stored as large Cloud Optimized GeoTiffs (COG). Spatial resolution varies from region to region. All images include 4 bands: red, green, blue and alpha. The alpha band can be used to mask out NoData values.

Given that the labels vary in quality (e.g. how exhaustively an image is labeled, how accurate the building footprints are), the training data have been divided up into tier 1 and tier 2 subsets. The tier 1 images have more complete labels than tier 2. We encourage you to begin training models on the tier 1 data before trying to incorporate tier 2.

All training images have been reprojected to the appropriate UTM zone projection for the region that they represent.

Training data summary

City Data class Scene count AOI area (sq km) Building count Total building size (sq km) Average building size (sq m) Building ratio (portion of area covered by bldgs)
acc train_tier_1 4 7.86 33585 2.85 84.84 0.36
dar train_tier_1 6 42.90 121171 12.02 99.20 0.28
dar train_tier_2 31 223.28 571047 53.77 94.16 0.24
gao train_tier_2 2 12.54 15792 1.28 81.05 0.10
kam train_tier_1 1 1.14 4056 0.22 53.14 0.19
kin train_tier_2 2 1.01 2357 0.17 71.29 0.17
mah train_tier_2 4 19.40 7313 1.51 206.48 0.08
mon train_tier_1 4 2.90 6947 1.05 150.71 0.36
nia train_tier_1 1 0.68 634 0.03 47.43 0.04
nia train_tier_2 2 2.46 7444 0.47 62.76 0.19
ptn train_tier_1 2 1.87 8731 0.64 72.73 0.34
znz train_tier_1 13 102.61 13407 1.62 120.83 0.02

Tier 1 sample

Area (abbreviation) Scene ID Thumbnail Resolution Pixel width x height
Accra (acc) 665946 acc 2 cm 84466 x 150147
Kampala (kam) 4e7c7f kam 4 cm 39270 x 40024
Pointe-Noire (ptn) f49f31 ptn 20 cm 6605 x 4185
Zanzibar (znz) aee7fd znz 7 cm 40551 x 40592

Tier 2 sample

Area (abbreviation) Scene ID Thumbnail Resolution Pixel width x height
Ngaoundere (gao) 4f38e1 gao 5 cm 56883 x 59802
Mahe Island (mah) 71e6c2 mah 7 cm 52517 x 91616
Dar es Salaam (dar) ef8f27 dar1_tier2 7 cm 50259 x 48185

Test data

The test set consists of 11,481 1024 x 1024 pixel COG "chips" derived from a number of different scenes. None of these scenes are included in the training set. Some of the test scenes are from regions that are present in the training set while others are not. The correct georeferences for the test chips have been removed. The test set labels (unavailable to participants) have a level of accuracy commensurate with the tier 1 data.

Example test set chip

Example chip


Each image in the train set corresponds to a GeoJSON, where labels are encoded as FeatureCollections. geometry provides the outline of each building in the image. There are additional fields like building:material which you are free to use in training your models, but keep in mind none of this metadata will be provided for the test chips. Your goal is only to classify the presence (or lack thereof) of a building on a pixel-by-pixel basis.

train_metadata.csv links the each image in the train set with its corresponding GeoJSON label file. This csv also includes the region and tier of the image. Note that region information is not provided for the test set.

Label GeoJSON files have been clipped to the extents of the non-NoData portions of the images, all building geometries will overlap with image data.

Example training data image and labels

Clipped labels

As previously mentioned, tier 1 labels are generally more accurate those in tier 2.

Tier 1 label example (Kampala) Tier 2 label example (Dar es Salaam)
Tier 1 labels Tier 2 labels

Data format

The metadata for the competition datasets are stored in SpatioTemporal Asset Catalogs (STACs). A STAC is a standardized specification that allows you to easily query geospatial imagery and labels. STACs are comprised of a series of JSON files that reference each other as well as the geospatial assets (e.g. imagery, labels) that they reference. PySTAC is a simple Python library for manipulating working with STAC objects.

The competition data is organized into three STACS: train_tier_1, train_tier_2 and test. For examples from the competition STACs as well as starter code for working with STACs in python, check out the STAC resources page.

Train STACs

Each of these catalogs contains a collection for each of the regions that are included in that subset of training data. Within each region are the COGs and GeoJSON files that are represented as STAC Item and LabelItems, respectively. The JSON files (e.g. catalog.json, collection.json, b15fce.json) include spatial and temporal information about the assets that objects and assets included below them. They also reference their 'child' and 'parent' objects, enabling you to easily traverse the file tree.


The test STAC covers the chips in the test set. It consists of one catalog and 11,481 Items (one for each chip). It is simpler than the training data STACs because all the image items link directly to the root catalog. There are also, naturally, no LabelItems.

Similarly to the training data STACs, the test set chip COGs are each in their own directory along with the STAC Item JSON.

Performance metric

To measure your model's performance, we'll use a metric called Jaccard index. This is a similarity measure between two label sets, and is defined as the intersection divided by the union. It is an accuracy metric, so a higher value is better (as opposed to an error metric, where a lower value is better). The Jaccard index can be calculated as follows:

$$J(A, B) = \frac{\left|A\cap B\right|}{\left|A\cup B\right|} = \frac{\left|A\cap B\right|}{\left|A|+|B|-|A\cap B\right|}$$

where |$A$| is the set of true labels and |$B$| is the set of predicted labels.

In Python you can easily calculate the Jaccard index using the scikit-learn function sklearn.metrics.jaccard_score(y_true, y_pred, average='micro'). R users may find the jaccard function in the jaccard package.

Submission format

You must submit your predictions in the form of single-band 1024 x 1024 TIFFs, zipped into a singe file. The lone band should consist of True (building footprint) and False (not a building) pixel values. Alternatively, 0's and 1's can be used. An example chip above would look like the following:

Example submission

The format for the submission is a .tar or .zip file containing a building footprint mask for each chip in the test set. Each mask must have the same name as its corresponding imagery chip in the test set. The order of the files does not matter.

For example, the first few files in your uncompressed might look like


Responsible AI

Data scientists are uniquely positioned to examine the ethical implications of their work and strive to mitigate unfair biases or potential harms. In this competition, we are excited to introduce a Responsible AI track with $3,000 in bonus prizes, which asks you to examine the practical ethics and appropriate use of ML/AI in the field of disaster risk management.

Read the full instructions here.

Segmentation track participants must submit at least once to the Responsible AI track to qualify for the $12,000 in segmentation performance prizes.

Good luck!

If you're wondering how to get started, check out our benchmark blog post!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!