On Cloud N: Cloud Cover Detection Challenge Hosted By Microsoft AI for Earth


STAC Resources

This challenge uses publicly available satellite data from the Sentinel-2 mission, which captures wide-swath, high-resolution, multi-spectral imaging. Data is publicly shared through Microsoft's Planetary Computer.

You may pull in any other information from the Planetary Computer to supplement the provided data. This page provides an overview of how data in the Planetary Computer is organized and how to access additional data.

Pulling in additional data is entirely optional ― you can also create a model based solely on the satellite imagery provided through the competition.

For example code demonstrating how to pull in an additional band, see the tutorial posted in the Planetary Computer Hub.

STAC specification

The Planetary Computer organizes the datasets it hosts using SpatioTemporal Asset Catalog (STAC). STAC is a standardized specification for organizing geospatial assets and metadata, making it easy to search for data that match spatial, temporal, or other criteria.

There are three primary components that together make up the STAC specification.

  • An Item represents a single spatiotemporal asset, or a unit of data and metadata that contains information about the Earth captured at a certain space and time. Examples include GeoJSONs referencing imagery, SAR, data cubes, or full motion videos. Properties of these Items may include spatial extent, temporal extent, band descriptions, label types, or other information like citation examples.

  • A Catalog provides links to Items or to other Catalogs. It can be thought of like a container, similar to a folder in a file structure. In a nested Catalog structure, its root is simply the top-level Catalog without a parent.

  • Finally, a Collection shares most fields with a Catalog, but has a number of additional fields like license, extent, providers, keywords, and summaries. Collections are used like Catalogs to provide structure to files, but generally consist of a set of assets that share higher level metadata (e.g. images from the same sensor).

Planetary Computer STAC API

Microsoft AI provides a STAC API that can be used to search for datasets hosted on the Planetary Computer. A STAC API is the dynamic version of a STAC. It returns a STAC Catalog, Collection, Item or STAC API ItemCollection. Catalog and Collection objects are JSONs, while Item and ItemCollection objects are GeoJSON-compliant entities. These files include spatial and temporal information about the 'child' and 'parent' objects they reference, making it easy to traverse the file tree.

Using the PySTAC library created by Azavea, you can load, traverse, and access data within these STACs programmatically. This quickstart guide demonstrates how to search for data using the STAC API with PySTAC. For additional resources on PySTAC, check out the intro to PySTAC blog post, documentation, and tutorials.

To get a SAS Token to enable access to the STAC API, use the Planetary Computer’s Data Authentication API. Alternatively, you can use the planetary-computer package to generate tokens and sign asset HREFs for access.

Access to the Planetary Computer will be allowed during inference in the code execution environment. To find additional bands for a given chip, search the Planetary Computer Hub based on both the geographic coordinates and the timestamp.