Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience Hosted By DrivenData

7 weeks left
$15,000

Working with STACs

Below are examples from the train and test STACs for this competition as well as some starter code for programmatically working with STACs using PySTAC.

Training data STACs

train_tier_1 and train_tier_2 STACs have identical formats but describe different data (tier 1 and 2 training data, respectively).

Sample file structure

train_tier_1/
├── catalog.json
├── acc
│   ├── collection.json
│   ├── 665946
│   │   ├── 665946.json
│   │   └── 665946.tif
│   ├── 665946-labels
│   │   ├── 665946-labels.json
│   │   └── 665946.geojson
│   ├── a42435
│   │   ├── a42435.json
│   │   └── a42435.tif
│   ├── a42435-labels
│   │   ├── a42435-labels.json
│   │   └── a42435.geojson
│   ├── ...
|
├── dar
│   ├── collection.json
│   ├── b15fce
│   │   ├── b15fce.json
│   │   └── b15fce.tif
│   ├── b15fce-labels
│   │   ├── b15fce-labels.json
│   │   └── b15fce.geojson
│   ├── ...
|
├── ...

Each of these catalogs contains a collection for each of the regions that are included in that subset of training data. Within each region are the COGs and GeoJSON files that are represented as STAC Item and LabelItems, respectively. The JSON files (e.g. catalog.json, collection.json, b15fce.json) include spatial and temporal information about the assets that objects and assets included below them. They also reference their 'child' and 'parent' objects, enabling you to easily traverse the file tree.

Niamey tier 1 collection json

{
    "id": "nia",
    "stac_version": "0.8.1",
    "description": "Tier 1 training data from nia",
    "links": [
        {
            "rel": "item",
            "href": "./825a50/825a50.json",
            "type": "application/json"
        },
        {
            "rel": "item",
            "href": "./825a50-labels/825a50-labels.json",
            "type": "application/json"
        },
        {
            "rel": "root",
            "href": "../catalog.json",
            "type": "application/json"
        },
        {
            "rel": "parent",
            "href": "../catalog.json",
            "type": "application/json"
        }
    ],
    "extent": {
        "spatial": {
            "bbox": [
                [
                    2.000607112710697,
                    13.570445755015827,
                    2.0089069498293726,
                    13.583969811536608
                ]
            ]
        },
        "temporal": {
            "interval": [
                [
                    "2019-10-29T00:00:00Z",
                    null
                ]
            ]
        }
    },
    "license": "various",
    "stac_extensions": [
        "label"
    ]
}

STAC objects must include a valid datetime string but temporal was not available for all images. The date "2019-10-29" was used as a placeholder for all scenes for which date of capture was unavailable.

A peek inside the Niamey tier 1 training data JSON shows how STAC encodes the metadata for this pair of image and label. The links with all STACs for this competition are relative and self contained. For ease of use, the assets are located alongside the STAC JSON files in the file tree. This means that a COG or GeoJSON file will always be located in the same directory as its corresponding item JSON file.

Test data STAC

The test STAC covers the chips in the test set. It consists of one catalog and 11,481 Items (one for each chip). It is simpler than the training data STACs because all the image items link directly to the root catalog. There are also, naturally, no LabelItems.

Similarly to the training data STACs, the test set chip COGs are each in their own directory along with the STAC Item JSON.

Sample of test chips file tree

test/
├── catalog.json
├── 2b70d4
│   ├── 2b70d4.json
│   └── 2b70d4.tif
├── d5b448
|   ├── d5b448.json
│   └── d5b448.tif
|
...

The test chip Item JSON files function like regular STAC image items but reveal no accurate SpatioTemporal information about the chip. The timestamp and georeferencing information are set to the same generic values for all chips.

Sample test chip item

{
    "type": "Feature",
    "stac_version": "0.8.1",
    "id": "2b70d4",
    "properties": {
        "datetime": "2019-10-29 00:00:00Z"
    },
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [
                    0.0,
                    0.0
                ],
                [
                    0.0,
                    0.045
                ],
                [
                    0.045,
                    5000.0
                ],
                [
                    0.045,
                    0.0
                ],
                [
                    0.0,
                    0.0
                ]
            ]
        ]
    },
    "bbox": [
        0.0,
        0.0,
        0.045,
        0.045
    ],
    "links": [
        {
            "rel": "root",
            "href": "../catalog.json",
            "type": "application/json"
        },
        {
            "rel": "parent",
            "href": "../catalog.json",
            "type": "application/json"
        }
    ],
    "assets": {
        "image": {
            "href": "./2b70d4.tif",
            "type": "image/tiff; application=geotiff; profile=cloud-optimized",
            "title": "GeoTIFF"
        }
    }
}

Programmatically accessing STACs with PySTAC

Using the PySTAC library created by Azavea, you can load, traverse, and access data within these STACs programmatically. For example:

Reading Catalogs

train1_cat = Catalog.from_file('https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/catalog.json')

train2_cat = Catalog.from_file('https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_2/catalog.json')

test_cat = Catalog.from_file('https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/test/catalog.json')

train1_cat.describe()
* <Catalog id=train_tier_1>
    * <Collection id=acc>
      * <Item id=665946>
      * <LabelItem id=665946-labels>
      * <Item id=a42435>
      * <LabelItem id=a42435-labels>
      * <Item id=ca041a>
      * <LabelItem id=ca041a-labels>
      * <Item id=d41d81>
      * <LabelItem id=d41d81-labels>
    * <Collection id=mon>
      * <Item id=401175>
      ...

Displaying Item properties

one_item = train1_cat.get_child(id='acc').get_item(id='ca041a')
one_item.to_dict()
{
  "assets": {
    "image": {
      "href": "https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/acc/ca041a/ca041a.tif",
      "title": "GeoTIFF",
      "type": "image/tiff; application=geotiff; profile=cloud-optimized"
    }
  },
  "bbox": [
    -0.22707525357332697,
    5.585527399115482,
    -0.20581415249279408,
    5.610742610987594
  ],
  "collection": "acc",
  "geometry": {
    "coordinates": [
      [
        [
          -0.2260939759101167,
          5.607821019807083
        ],
        ...
        [
          -0.2260939759101167,
          5.607821019807083
        ]
      ]
    ],
    "type": "Polygon"
  },
  "id": "ca041a",
  "links": [
    {
      "href": "../collection.json",
      "rel": "collection",
      "type": "application/json"
    },
    {
      "href": "https://drivendata-competition-building-segmentation.s3-us-west-1.amazonaws.com/train_tier_1/acc/ca041a/ca041a.json",
      "rel": "self",
      "type": "application/json"
    },
    {
      "href": "../../catalog.json",
      "rel": "root",
      "type": "application/json"
    },
    {
      "href": "../collection.json",
      "rel": "parent",
      "type": "application/json"
    }
  ],
  "properties": {
    "area": "acc",
    "datetime": "2018-11-12 00:00:00Z",
    "license": "CC BY 4.0"
  },
  "stac_version": "0.8.1",
  "type": "Feature"
}

Additional resources

For a more in-depth starter on using PySTAC combined with GeoPandas and Rasterio to access the competition data, see this Colab notebook:

More info on PySTAC: