On Cloud N: Cloud Cover Detection Challenge

Clouds obscure important ground-level features in satellite images, complicating their use in downstream applications. Build algorithms for cloud cover detection using a new cloud dataset and Microsoft's Planetary Computer! #science

$20,000 in prizes
feb 2022
848 joined

Code submission format


This is a code submission challenge! Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution. If you want to learn more about how our code execution competitions work, check out our blog post.

Developing your model

The competition runtime container is designed to be compatible with Microsoft's Planetary Computer containers. Your inference code should be able to run succesfully in the Planetary Computer environment. There are a few strategies for developing a model that is compatible:

  • Develop your model within the Planetary Computer Hub, which provides a hosted Jupyter Notebook in one of the Planetary Computer environments. All of the competition the data is already available within the Planetary Computer Hub in a read-only attached volume. A version of the benchmark blog post is available in the Planetary Computer Hub as a fully executable notebook, which you can also download to your local machine.

  • Create a virtual environment on your local machine similar to the Planetary Computer Hub - for example, with conda - and develop your model in that environment. If you are using PyTorch, you can do this by installing a subset of the packages specified in the environment.yml associated with the Planetary Computer's "GPU - PyTorch" environment.

To request access to the Planetary Computer Hub, fill out this form and include "DrivenData" in your area of study.

What to submit

Your final submission should be a zip archive named with the extension .zip (for example, submission.zip). Submission requirements:

  • The root level of submission.zip contains a main.py that performs inference on all of the test chips in /codeexecution/data/test_features and writes predictions in the form of single-band 512x512 pixel .tifs into the /codeexecution/predictions folder. Be sure that when you unzip your submission, main.py exists at the root level of the unzipped folder and not in a subdirectory.

  • The prediction .tifs that you generate consist of 1 (cloud) and 0 (no cloud) pixel values and have data type uint8. File names should match the chip IDs from the test dataset. For example, if the test set includes a chip with ID abcd, running main.py must write out a predicted cloud cover TIF mask to /codeexecution/predictions/abcd.tif.

  • The submission contains any model weights that need to be loaded. There will be no network access besides the Planetary Computer STAC API.

  • main.py loads all data for inference from the read-only /codeexecution/data/test_features and (optionally) the Planetary Computer STAC API.

  • main.py executes succesfully in the competition runtime based on the runtime repository limitations.

All other structure of submission.zip is up to you. The runtime repository contains a submission based on the benchmark, which serves as an example of how the submission source code can be structured and how to create a .zip from the submission files. Once the benchmark submission.zip is unzipped, it contains:

.
├── main.py  # Inference script - this is the only required file
├── assets  # Example of saved weights for the trained model
│   └── cloud_model.pt  # Example of saved weights for the trained model
├── cloud_dataset.py  # The last three are utility scripts that are imported by main.py
├── cloud_model.py    # (Not all submissions will have these)
└── losses.py

Only main.py above is required. The rest of the python scripts above (cloud_dataset.py, cloud_model.py, and losses.py) are all specific to how the benchmark submission is structured―your submission can include whatever extra scripts you like.

See the runtime repository and the benchmark blog post for the full benchmark example.

Scoring process

When your submission is scored in the competition runtime, submission.zip will be unzipped into a folder called codeexecution and run in our cloud compute cluster. The codeexecution folder then contains the following files:

/codeexecution
├── data
│   ├── test_features  <-- read chips from this directory
│   │   ├── aaaa  <-- your code makes predictions for each chip
│   │   │   ├── B02.tif
│   │   │   ├── B03.tif
│   │   │   ├── B04.tif
│   │   │   └── B08.tif
│   │   ├── ...
│   │   └── zzzu
│   │       ├── B02.tif
│   │       ├── B03.tif
│   │       ├── B04.tif
│   │       └── B08.tif
│   └── test_metadata.csv
├── main.py  <-- your code submission main.py and any additional assets
├── ...  <-- additional assets from your submission.zip
├── predictions  <-- empty folder where you will save your test predictions as tifs, eg. aaaa.tif
└── submission
    └── log.txt  <-- log messages emitted while running your code

The test images will be available in data/test_features. Within test_features, there is a folder for each chip_id in the test data. Each chip folder contains a TIF image corresponding to each band as <band_name>.tif. For example, the B02 band (blue visible light) of a chip with ID abcd would be saved at data/test_features/abcd/B02.tif. For details about which bands are available, see the problem description.

Supplementary data

For this competition, you may pull in any other information from the Planetary Computer Hub to supplement the provided data. We recommend using the Planetary Computer STAC API to access any additional data. The use of supplementary input for training and/or inference is optional.

For example code demonstrating how to pull in an additional band, see the tutorial posted in the Planetary Computer Hub.

Example main.py

The following example main.py generates a correctly formatted (but very inaccurate!) prediction for each test chip, and saves them to the predictions folder.

from pathlib import Path
import numpy as np
from tifffile import imsave, imread

ROOT_DIRECTORY = Path("/codeexecution")
PREDICTIONS_DIRECTORY = ROOT_DIRECTORY / "predictions"
INPUT_IMAGES_DIRECTORY = ROOT_DIRECTORY / "data/test_features"

BANDS = ["B02", "B03", "B04", "B08"]

chip_ids = (
    pth.name for pth in INPUT_IMAGES_DIR.iterdir() if not pth.name.startswith(".")
)

for chip_id in chip_ids:
    band_arrs = []
    for band in BANDS:
        band_arr = imread(INPUT_IMAGES_DIRECTORY / f"{chip_id}/{band}.tif")
        band_arrs.append(band_arr)
    chip_arr = np.stack(band_arrs)
    # could do something useful here with chip_arr ;-)
    prediction = np.zeroes((512, 512), dtype="uint8")
    output_path = PREDICTIONS_DIRECTORY / f"{chip_id}.tif"
    imsave(output_path, prediction)

Testing your submission locally

If you'd like to replicate how your submission will run online, you can test the submission locally first. This is recommended to work out any bugs and ensure that your model inference will run quickly enough.

Runtime

Runtime means the particular hardware and software context in which code runs, including the specific versions of the operating system, drivers, software packages, etc.

Your code is executed within a container that is defined in our runtime repository. The limits are as follows:

  • Your submission must be written in Python using the packages defined in the runtime repository.

  • The submission must complete execution in 4 hours or less. We expect most submissions will complete much more quickly and computation time per participant will be monitored to prevent abuse.

  • The container runtime has access to a single GPU. All of your code should run within the GPU environments in the container, even if actual computation happens on the CPU. (CPU environments are provided within the container for local debugging only).

  • The container has access to 5 vCPUs powered by an Intel Xeon E5-2690 chip and 48GB RAM.

  • The container has 1 Tesla K80 GPU with 12GB of memory.

  • The container execution will not have root access to the filesystem.

  • The container will block all internet access, except to the Planetary Compute STAC API.

The GPUs for executing your inference code are a shared resource across competitors. We request you are conscientious in your use of them. Please add progress information to your logs and cancel jobs that will run longer than the time limit. Canceled jobs won't count against your submission limit, and this means more available resources to score submissions that will complete on time.

Requesting package installations

Since the Docker container will not have network access, all packages must be pre-installed. We are happy to add packages as long as they do not conflict and can build successfully.

To request an additional package be added to the docker image, follow the instructions in the runtime repository and make sure to edit both the CPU and GPU versions. You can test this locally by following the instructions to build the Docker images locally first before opening your pull request.