Navigation

Problem description

In this challenge, you will be segmenting building footprints from aerial imagery. The data consists of drone imagery from 10 different cities and regions across Africa. Your goal is to classify the presence or absence of a building on a pixel-by-pixel basis.

Features
Images
Labels
Data format

Performance metric
Example

Submission Format
Format example

The features in this dataset

The features in this dataset are the images themselves and the building footprints in the GeoJSONs, which can be used to train a building segmentation model. All training data (with the exception of the labels for images of Zanzibar) are pulled from OpenStreetMap.

Images

The images are stored as large Cloud Optimized GeoTiffs (COG). Spatial resolution varies from region to region. All images include 4 bands: red, green, blue and alpha. The alpha band can be used to mask out NoData values.

Given that the labels vary in quality (e.g. how exhaustively an image is labeled, how accurate the building footprints are), the training data have been divided up into tier 1 and tier 2 subsets. The tier 1 images have more complete labels than tier 2. We encourage you to begin training models on the tier 1 data before trying to incorporate tier 2.

All training images have been reprojected to the appropriate UTM zone projection for the region that they represent.

Training data summary

City	Data class	Scene count	AOI area (sq km)	Building count	Total building size (sq km)	Average building size (sq m)	Building ratio (portion of area covered by bldgs)
acc	train_tier_1	4	7.86	33585	2.85	84.84	0.36
dar	train_tier_1	6	42.90	121171	12.02	99.20	0.28
dar	train_tier_2	31	223.28	571047	53.77	94.16	0.24
gao	train_tier_2	2	12.54	15792	1.28	81.05	0.10
kam	train_tier_1	1	1.14	4056	0.22	53.14	0.19
kin	train_tier_2	2	1.01	2357	0.17	71.29	0.17
mah	train_tier_2	4	19.40	7313	1.51	206.48	0.08
mon	train_tier_1	4	2.90	6947	1.05	150.71	0.36
nia	train_tier_1	1	0.68	634	0.03	47.43	0.04
nia	train_tier_2	2	2.46	7444	0.47	62.76	0.19
ptn	train_tier_1	2	1.87	8731	0.64	72.73	0.34
znz	train_tier_1	13	102.61	13407	1.62	120.83	0.02

Tier 1 sample

Area (abbreviation)	Scene ID	Resolution	Pixel width x height
Accra (acc)	665946	2 cm	84466 x 150147
Kampala (kam)	4e7c7f	4 cm	39270 x 40024
Pointe-Noire (ptn)	f49f31	20 cm	6605 x 4185
Zanzibar (znz)	aee7fd	7 cm	40551 x 40592

Tier 2 sample

Area (abbreviation)	Scene ID	Resolution	Pixel width x height
Ngaoundere (gao)	4f38e1	5 cm	56883 x 59802
Mahe Island (mah)	71e6c2	7 cm	52517 x 91616
Dar es Salaam (dar)	ef8f27	7 cm	50259 x 48185

Test data

The test set consists of 11,481 1024 x 1024 pixel COG "chips" derived from a number of different scenes. None of these scenes are included in the training set. Some of the test scenes are from regions that are present in the training set while others are not. The correct georeferences for the test chips have been removed. The test set labels (unavailable to participants) have a level of accuracy commensurate with the tier 1 data.

Example test set chip

Example chip

Labels

Each image in the train set corresponds to a GeoJSON, where labels are encoded as FeatureCollections. geometry provides the outline of each building in the image. There are additional fields like building:material which you are free to use in training your models, but keep in mind none of this metadata will be provided for the test chips. Your goal is only to classify the presence (or lack thereof) of a building on a pixel-by-pixel basis.

train_metadata.csv links the each image in the train set with its corresponding GeoJSON label file. This csv also includes the region and tier of the image. Note that region information is not provided for the test set.

Label GeoJSON files have been clipped to the extents of the non-NoData portions of the images, all building geometries will overlap with image data.

Example training data image and labels

Clipped labels

As previously mentioned, tier 1 labels are generally more accurate those in tier 2.

Tier 1 label example (Kampala)	Tier 2 label example (Dar es Salaam)

Data format

The metadata for the competition datasets are stored in SpatioTemporal Asset Catalogs (STACs). A STAC is a standardized specification that allows you to easily query geospatial imagery and labels. STACs are comprised of a series of JSON files that reference each other as well as the geospatial assets (e.g. imagery, labels) that they reference. PySTAC is a simple Python library for manipulating working with STAC objects.

The competition data is organized into three STACS: train_tier_1, train_tier_2 and test. For examples from the competition STACs as well as starter code for working with STACs in python, check out the STAC resources page.

Train STACs

Each of these catalogs contains a collection for each of the regions that are included in that subset of training data. Within each region are the COGs and GeoJSON files that are represented as STAC Item and LabelItems, respectively. The JSON files (e.g. catalog.json, collection.json, b15fce.json) include spatial and temporal information about the assets that objects and assets included below them. They also reference their 'child' and 'parent' objects, enabling you to easily traverse the file tree.

Test STAC

The test STAC covers the chips in the test set. It consists of one catalog and 11,481 Items (one for each chip). It is simpler than the training data STACs because all the image items link directly to the root catalog. There are also, naturally, no LabelItems.

Similarly to the training data STACs, the test set chip COGs are each in their own directory along with the STAC Item JSON.

Performance metric

To measure your model's performance, we'll use a metric called Jaccard index. This is a similarity measure between two label sets, and is defined as the intersection divided by the union. It is an accuracy metric, so a higher value is better (as opposed to an error metric, where a lower value is better). The Jaccard index can be calculated as follows:

$$J(A, B) = \frac{\left|A\cap B\right|}{\left|A\cup B\right|} = \frac{\left|A\cap B\right|}{\left|A|+|B|-|A\cap B\right|}$$

where |$A$| is the set of true labels and |$B$| is the set of predicted labels.

In Python you can easily calculate the Jaccard index using the scikit-learn function sklearn.metrics.jaccard_score(y_true, y_pred, average='micro'). R users may find the jaccard function in the jaccard package.

Submission format

You must submit your predictions in the form of single-band 1024 x 1024 TIFFs, zipped into a singe file. The lone band should consist of True (building footprint) and False (not a building) pixel values. Alternatively, 0's and 1's can be used. An example chip above would look like the following:

Example submission

The format for the submission is a .tar or .zip file containing a building footprint mask for each chip in the test set. Each mask must have the same name as its corresponding imagery chip in the test set. The order of the files does not matter.

For example, the first few files in your uncompressed submission.zip might look like

eb1a85.TIFF
21afdb.TIFF
c475ed.TIFF
356b9d.TIFF
bb3821.TIFF
...

Responsible AI

Data scientists are uniquely positioned to examine the ethical implications of their work and strive to mitigate unfair biases or potential harms. In this competition, we are excited to introduce a Responsible AI track with $3,000 in bonus prizes, which asks you to examine the practical ethics and appropriate use of ML/AI in the field of disaster risk management.

Read the full instructions here.

Segmentation track participants must submit at least once to the Responsible AI track to qualify for the $12,000 in segmentation performance prizes.

Good luck!

If you're wondering how to get started, check out our benchmark blog post!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!

Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience

Quick Facts

Participants

No. of Entries

Prize

Winner

qubvel