STAC Overflow: Map Floodwater from Radar Imagery Hosted By Microsoft AI for Earth


Problem description

In this challenge, you will be detecting the presence of floodwater in synthetic-aperture radar (SAR) imagery. The primary data consist of satellite imagery captured between 2016 and 2020 from different regions around the world. You may optionally supplement this data with information on permanent water sources and/or elevation, available through the Microsoft AI for Earth STAC API. Your goal is to build a model that classifies the presence or absence of water on a pixel-by-pixel basis for each chip in the test set.


This dataset consists of Sentinel-1 radar images stored as GeoTIFFs. Additionally, you are given a set of metadata for the training set that contains country and date information for each chip.

Training set


The features in this dataset are the radar images themselves. While radar can be difficult to interpret visually, it is especially useful for detecting features through vegetation, cloud coverage, and low lighting. There is one image per band and two bands per chip. Each image is 512 x 512 pixels and is stored as a GeoTIFF. Each pixel in a radar image represents the energy that was reflected back to the satellite measured in decibels (dB). Pixel values can range from negative to positive values. A pixel value of 0.0 indicates missing data.

Sentinel-1 is a phase-preserving dual polarization SAR system, meaning that it can transmit and receive a signal in both horizontal and vertical polarizations. Different polarizations can be used to bring out different physical properties in a scene. The data for this challenge includes two microwave frequency readings: VV (vertical transmit, vertical receive) and VH (vertical transmit, horizontal receive).

For each chip, you will use one or both of these bands to detect floodwater. Please note you will only be evaluated on predictions made for valid input pixels. Pixels with missing data in the radar imagery will be excluded during scoring.

The figure below illustrates how radars transmit and receive polarized energy by applying specific filters.


The first symbol indicates the direction of transmission and the second indicates the direction of reception.
Image Credit: Remote Sensing of the Environment.

The training set consists of 542 chips (1084 images) from 13 flood events. You can access the training images by downloading the zip archive from the data download page.

The images are named {image_id}.tif which is equivalent to {chip_id}_{polarization}.tif. A chip_id consists of a three letter flood_id and a two digit chip_number. Each chip has a _vv and _vh polarization band. For example, awc05_vv.tif represents the vv band for chip number 05 from event awc.

If you had the train_features directory in data/, then listing the first few examples would give:

$ ls data/train_features | head -n 6

Here is a side-by-side example of a VV and VH band for a single chip, visualized with arbitrary colors:


Each GeoTIFF contains a set of metadata including bounding coordinates, an affine transform, and its coordinate reference system (CRS) projection. In Python, you can easily access geospatial raster data and extract this metadata using the rasterio package.

with as f:
    meta = f.meta
    bounds = f.bounds

To access a masked array indicating which pixels are missing, where True denotes invalid pixels, use the masked flag when reading in the data.

with as f:
    masked_arr =, masked=True)
    data =
    mask = masked_arr.mask

Supplementary data

Information about a geography's natural topography and permanent water sources may also help your model to better detect floodwater. You may supplement your input imagery with:

  • elevation data from the NASA Digital Elevation Model (NASADEM)
  • global surface water data from the European Commission's Joint Research Centre (JRC), including map layers for seasonality, occurrence, change, recurrence, transitions, and extent

During training, these datasets can be accessed through the Planetary Computer STAC API by searching for images that overlap an area during a specific time.

Note on external data: External data is not allowed in this competition. Participants can use pre-trained computer vision models as long as they were available freely and openly in that form at the start of the competition.

Test set

The test set images are only accessible in the runtime container and are mounted at data/test_features. Test set images are not georeferenced.

Supplementary data on elevation (NASADEM) and permanent water (JRC) have been made available to use at the time of inference without network access. Data files that correspond with each test image geography are available as static resources in the code execution environment.


Each chip corresponds with a single label, stored as a GeoTIFF. A label is a 512 x 512 pixel mask indicating which pixels in a scene contain water, where:

  • 1 indicates the presence of water
  • 0 indicates the absence of water
  • 255 indicates missing data

Labels are named {chip_id}.tif.

Each set of two polarization bands (VV and VH) corresponds with a single label.

Performance metric

To measure your model’s performance, we’ll use a metric called Jaccard index, also known as Generalized Intersection over Union (IoU). Jaccard index is a similarity measure between two label sets. In this case, it is defined as the size of the intersection divided by the size of the union of non-missing pixels. This computation excludes predictions on missing data. Because it is an accuracy metric, a higher value is better. The Jaccard index can be calculated as follows:

$$J(A, B) = \frac{\left|A\cap B\right|}{\left|A\cup B\right|} = \frac{\left|A\cap B\right|}{\left|A|+|B|-|A\cap B\right|}$$

where |$A$| is the set of true pixels and |$B$| is the set of predicted pixels.

In Python, you can easily calculate the Jaccard index using the scikit-learn function sklearn.metrics.jaccard_score(y_true, y_pred, average='binary'). Below is some pseudocode that demonstrates how you might calculate the Jaccard index in Python:

intersection = 0
union = 0

for pred, actual in file_pairs:
    mask = # get valid pixels
    actual = actual.masked_select(mask)
    pred = pred.masked_select(mask)

    intersection += np.logical_and(actual, pred).sum()
    union += np.logical_or(actual, pred).sum()

iou = intersection / union

Submission format

This is a code execution challenge! Rather than submit your predicted labels, you’ll package everything needed to do inference at the chip level and submit that for containerized execution. Your code must be able to generate predictions in the form of single-band 512x512 pixel tifs. The lone band should consist of 1 (water) and 0 (no water) pixel values. Pixels with missing data in the radar imagery will be excluded during scoring. An example prediction would look like the following:

Example Label

See complete details on making your executable code submission here.

Good luck!

If you’re wondering how to get started, check out our benchmark blog post.

Good luck and enjoy this challenge! If you have any questions you can always visit the user forum.