science

On Cloud N: Cloud Cover Detection Challenge

Clouds obscure important ground-level features in satellite images, complicating their use in downstream applications. Build algorithms for cloud cover detection using a new cloud dataset and Microsoft's Planetary Computer! #science

$20,000 in prizes

feb 2022

848 joined

Navigation

Problem description

In this challenge, your goal is to label clouds in satellite imagery. In many uses of satellite imagery, clouds obscure what we really care about - for example, tracking wildfires, mapping deforestation, or visualizing crop health. Being able to more accurately remove clouds from satellite images filters out interference, unlocking the potential of a vast range of use cases.

The challenge uses publicly available satellite data from the Sentinel-2 mission, which captures wide-swath, high-resolution, multi-spectral imaging. Data is publicly shared through Microsoft's Planetary Computer.

Features
Images
Metadata
Example
Additional data

Labels
Label format
Example

Submission
Performance metric
Submission format
Example

Overview of the data provided for this competition:

.
├── train_features
│   └── ...
├── train_labels
│    └── ...
└── train_metadata.csv

Feature data

The dataset consists of Sentinel-2 satellite imagery stored as GeoTiffs. There are almost 12,000 chips in the training data, collected between 2018 and 2020. Each chip is imagery of a specific area captured at a specific point in time.

Sentinel-2 flies over the part of the Earth between 56° South (Cape Horn, South America) and 82.8° North (above Greenland), so our observations are all between these two latitudes. The chips are mostly from Africa, South America, and Australia. For more background about how data from Sentinel-2 is collected, see the About page.

Images

The main features in this challenge are the satellite images themselves. There are four images associated with each chip. Each image within a chip captures light from a different range of wavelengths, or "band". For example, the B02 band for each chip shows the strengh of visible blue light, which has a wavelength around 492 nanometers (nm). The bands provided are:

Band	Description	Center wavelength
B02	Blue visible light	497 nm
B03	Green visible light	560 nm
B04	Red visible light	665 nm
B08	Near infrared light	835 nm

Each band image is provided as a 512 x 512 GeoTIFF. The resolution, or real-world distance between pixels, is 10m. All four bands for a given chip cover the exact same area.

The data download page includes instructions for how to download the training images. Within the folder train_features, there is a folder for every chip.

train_features
├── adwp # chip with id adwp
├── ├──B02.tif
├── ├──B03.tif
├── ├──B04.tif
├── └──B08.tif
├── adwu
├── ├──B02.tif
├── ├──B03.tif
├── ├──B04.tif
├── └──B08.tif
└── ...

For example, the red visible light band of a chip with ID abcd in the training data would be saved as train_features/abcd/B04.tif.

Each GeoTIFF contains a set of metadata including bounding coordinates, an affine transform, and its coordinate reference system (CRS) projection. In Python, you can access geospatial raster data and extract this metadata using the rasterio package.

with rasterio.open("train_features/cjge/B04.tif") as f:
    meta = f.meta
    bounds = f.bounds
print("Meta:", meta, "\nBounds:", bounds)

Meta: 
{'driver': 'GTiff',
  'dtype': 'uint16',
  'nodata': 0.0,
  'width': 512,
  'height': 512,
  'count': 1,
  'crs': CRS.from_epsg(32630),
  'transform': Affine(10.0, 0.0, 579060.0,
         0.0, -10.0, 3568130.0)}

BoundingBox(left=579060.0, bottom=3563010.0, right=584180.0, top=3568130.0))

For an example of how to manipulate GeoTIFF metadata, see the benchmark blog post.

Metadata

Metadata is also included as a CSV, train_metadata.csv. It contains the following columns:

chip_id (string): A unique identifier for each chip. There is one row per chip in each of the metadata files
location (string): General location of the chip, either country (eg. Eswatini), metropolitan region (eg. Lusaka), or country sub-region (eg. Australia - Central)
datetime (datetime64[ns, UTC]): Date and time that the images in the chip were captured. These will be loaded as strings with the format %Y-%m-%dT%H:%M:%SZ (using standard string format codes). Z indicates that timestamps are in coordinated universal time (UTC)
cloudpath (string): The path to download the folder of chip images from the Azure Blob Storage container. For a step-by-step guide, see data_download_instructions.txt on the data download page.

The first few rows of train_metadata.csv are:

	chip_id	location	datetime	cloudpath
0	adwp	Chifunfu	2020-04-29T08:20:47Z	az://./train_features/adwp
1	adwu	Chifunfu	2020-04-29T08:20:47Z	az://./train_features/adwu
2	adwz	Chifunfu	2020-04-29T08:20:47Z	az://./train_features/adwz
3	adxp	Chifunfu	2020-04-29T08:20:47Z	az://./train_features/adxp
4	aeaj	Chifunfu	2020-04-29T08:20:47Z	az://./train_features/aeaj

Chifunfu is an area of Tanzania! There are up to 400 chips for each location in the data. Each location is either entirely in the train set or the test set, so all of the settings in the test set will be entirely new.

Feature data example

Feature information for the chip in the training set with ID cjge (taken from Bechar, Algeria)

Metadata

chip_id	location	datetime	cloudpath
cjge	Bechar	2019-11-12T11:02:20Z	az://./train_features/cjge

Feature images

band	description	filepath
B02	Blue visible light	train_features/cjge/B02.tif
B03	Green visible light	train_features/cjge/B03.tif
B04	Red visible light	train_features/cjge/B04.tif
B08	Near infrared light	train_features/cjge/B08.tif

B02.tif (blue visible light band)




array([[1102, 1250, 1324, ..., 2928, 2902, 2802],
       [1160, 1252, 1326, ..., 2854, 2844, 2754],
       [1270, 1338, 1348, ..., 2732, 2800, 2814],
       ...,
       [1086, 1100, 1072, ..., 1200, 1188, 1076],
       [1118, 1118, 1082, ..., 1168, 1190, 1140],
       [1158, 1140, 1086, ..., 1222, 1208, 1148]], dtype=uint16)

B04.tif (red visible light band)




array([[3134, 3350, 3492, ..., 4074, 4036, 3978],
       [3190, 3386, 3466, ..., 4064, 4040, 3996],
       [3232, 3420, 3488, ..., 4016, 4016, 4016],
       ...,
       [2714, 2678, 2654, ..., 2942, 2968, 2822],
       [2778, 2732, 2688, ..., 2930, 2996, 2822],
       [2812, 2724, 2654, ..., 2924, 2958, 2886]], dtype=uint16)

Each TIF is a single-band image. The shape of each image array is (512, 512).

Additional data

While four bands per chip are included in the competition data, the publicly available Sentinal-2 dataset includes up to 10 bands capturing different wavelengths. You may pull in any other information from the Planetary Computer to supplement the provided data. We recommend using the Planetary Computer STAC API to access any additional data. Access to the Planetary Computer will be allowed during inference in the code execution environment. To find additional bands for a given chip, search the Planetary Computer Hub based on both the geographic coordinates and the timestamp.

For example code demonstrating how to pull in an additional band, see the tutorial posted in the Planetary Computer Hub.

The competition dataset includes the four bands with a resolution of 10m, and is exclusively from the L2A dataset. Some chips may not have every single band available.

Note on external data: External data that is not from the Planetary Computer is not allowed in this competition. Participants can use pre-trained computer vision models as long as they were available freely and openly in that form at the start of the competition.

Labels

The labels for the competition are 512 x 512 GeoTIFFs with pixels indicating cloud (1) / no cloud (0) for each chip. Each label is saved as <chip_id>.tif in the train_labels folder. For example, listing the first few tiles in train_labels in your terminal would give:

$ ls train_labels | head -n 5
adwp.tif
adwu.tif
adwz.tif
adxp.tif
aeaj.tif

Like the feature images, label GeoTIFFs contain additional metadata including bounding coordinates that can be accessed with rasterio. Any missing values in the labels have been converted to 0, or no cloud, during validation. None of the cloud cover masks in the competition dataset are publicly available in the Planetary Computer.

Labelled training data example

cjge.tif

An example TIF image with the cloud cover ground truth label. About half of the image is covered in clouds.

array([[0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1],
       ...,
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [0, 1, 1, ..., 0, 0, 0]], dtype=uint8)

Each label TIF is a single-band image. The shape of each image array is (512, 512).

Performance metric

To measure your model’s performance, we’ll use a metric called Jaccard index, also known as Generalized Intersection over Union (IoU). Jaccard index is a similarity measure between two label sets. In this case, it is defined as the size of the intersection divided by the size of the union of non-missing pixels. In this competition there should be no missing data. Because it is an accuracy metric, a higher value is better. The Jaccard index can be calculated as follows:

$$J(A, B) = \frac{\left|A\cap B\right|}{\left|A\cup B\right|} = \frac{\left|A\cap B\right|}{\left|A|+|B|-|A\cap B\right|}$$

where |$A$| is the set of true pixels and |$B$| is the set of predicted pixels.

In Python, you can easily calculate the Jaccard index using the scikit-learn function sklearn.metrics.jaccard_score(y_true, y_pred, average='binary'). Below is some pseudocode that demonstrates how you might calculate the Jaccard index in Python without using sklearn:

import numpy as np

# get the sum of intersection and union over all chips
intersection = 0
union = 0

for pred, actual in file_pairs:
    intersection += np.logical_and(actual, pred).sum()
    union += np.logical_or(actual, pred).sum()

# calculate the score across all chips
iou = intersection / union

In the above, pred and actual are each an array of 512x512 containing only 0s and 1s. file_pairs is a list of tuples where each tuple is (pred, actual).

Submission format

This is a code execution challenge! Rather than submitting your predicted labels, you’ll package everything needed to do inference at the chip level and submit that for containerized execution. Your code must be able to generate predictions in the form of single-band 512x512 pixel TIFs. The lone band should consist of 1 (cloud) and 0 (no cloud) pixel values. An example prediction for one chip could look like the following:

Prediction format example

An example TIF image with the predicted cloud cover mask

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1]], dtype=uint8)

Each prediction TIF is a single-band image. The shape of each image array is (512, 512).

For full instructions on packaging your submission for code execution, see the Code Submission Format page.

Good luck!

Good luck and enjoy the challenge! If you have any questions you can always visit the user forum!

On Cloud N: Cloud Cover Detection Challenge

Quick Facts

Participants

No. of Entries

Prize

Winner

adityakumarsinha