6 weeks left
\$500,000

# Evaluation stage

This page outlines the data and submission format for Stage 2: Model Evaluation. In this stage, you'll package everything needed to perform inference on new data each week, and submit weekly predictions through the winter and spring season to see how your solution performs! A live leaderboard will track performance as new ground truth labels become available.

Date submissions scored Latest column scored
2022-05-16 2022-05-12

## Data

The features for this competition include remote sensing data, snow measurements captured by volunteer networks, climate information, and a narrow set of ground measures.

For model training, you are given snow water equivalent (SWE) measures for 2013-2021 for 1km x 1km grid cells. SWE values for 2013-2019 were provided in train_labels.csv in Stage 1. Data for 2020-2021 is made available on the Stage 2 data download page in labels_2020_2021.csv. Note that this means you may use approved data from both training and test periods from Stage 1 as training data for Stage 2 submissions. You will then use your model to make weekly predictions for 2022 for the grid cells identified in submission_format.csv.

### Approved features (inputs)

All data sources and access locations detailed below are pre-approved for both model training and inference. These data have undergone a careful selection process and are freely and publicly available. Note that only the listed data products and access locations are pre-approved for use. You must go through an approved access location when using these sources for inference.

If you are interested in using additional data sources or access locations, see the process for requesting additional data sources below.

#### Ground measures

Ground measures can help to provide regularly collected, highly accurate point estimates of SWE at designated stations.

SNOTEL: The Snow Telemetry (SNOTEL) program consists of automated and semi-automated data collection sites across the Western U.S.

CDEC: The California Data Exchange Center (CDEC) facilitates the collection, storage, and exchange of hydrologic and climate information to support real-time flood management and water supply needs in California. CDEC operates data collection sites similar to SNOTEL within California.

Ground-based sites from SNOTEL and CDEC are used both as an optional input data source and in ground truth labels for this competition. Sites used for evaluation are entirely distinct from those in the features data.

Approved data access location: When using these sources, you are only permitted to use the data contained in the provided ground measures csvs.

Historical ground measures data: Ground measures data from 2013-2019 and 2020-2021 was provided in ground_measures_train_features.csv and ground_measures_test_features.csv, respectively, in Stage 1.

Real time ground measures data: The ground_measures_features.csv file on the data download page will contain weekly updates for ground measure sites beginning on January 13, 2022. This file will be updated weekly during the evaluation period for use in weekly predictions. Predictions will be available by the end of the day for the day to which they apply.

Solutions may not use any other historical ground measure data as input features during inference, as solutions should be able to estimate SWE in locations where there is no historical ground station or ASO data available.

In addition, ground_measures_metadata.csv contains metadata about the SNOTEL and CDEC sites, with six columns:

• station_id (str): unique site identifier
• name (str): site name
• elevation_m (float): elevation in meters
• latitude (float): latitude
• longitude (float): longitude
• state (str): state

#### Remote sensing data

The features for this competition include remote sensing images from satellites. These data sources provide regular monitoring of land surfaces.

MODIS Terra MOD10A1 and Aqua MYD10A1: MODIS/Terra and MODIS/Aqua Snow Cover Daily L3 Global 500m SIN Grid. Terra's orbit around the Earth is timed so that it passes from north to south across the equator in the morning, while Aqua passes south to north over the equator in the afternoon. Snow-covered land typically has very high reflectance in visible bands and very low reflectance in shortwave infrared bands. The Normalized Difference Snow Index (NDSI) reveals the magnitude of this difference. The snow cover algorithm calculates NDSI for all land and inland water pixels in daylight using MODIS band 4 (visible green) and band 6 (shortwave near-infrared).

Landsat 8 Collection 2 Level-2: Landsat, a joint NASA/USGS program, provides the longest continuous space-based record of Earth’s land in existence. Landsat Collection 2 includes scene-based global Level-2 surface reflectance and surface temperature science products.

Sentinel 2 Level-2A: Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission, supporting the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas. The Sentinel-2 Multispectral Instrument (MSI) samples 13 spectral bands: four bands at 10 metres, six bands at 20 metres and three bands at 60 metres spatial resolution. The mission provides a global coverage of the Earth's land surface every 5 days, making the data of great use in on-going studies.

Sentinel 1 Ground Range Detected (GRD) Data: The Sentinel-1 mission is a constellation of C-band Synthetic Aperature Radar (SAR) satellites from the European Space Agency launched since 2014. These satellites collect observations of radar backscatter intensity day or night, regardless of the weather conditions, making them enormously valuable for environmental monitoring. These radar data have been detected, multi-looked, and projected to ground range using an Earth ellipsoid model. Ground range coordinates are the slant range coordinates projected onto the ellipsoid of the Earth, where pixel values represent detected magnitude.

Sentinel 1 Terrain Corrected Data: The Google Earth Engine Sentinel-1 collection includes GRD scenes that have been processed to generate a calibrated, ortho-corrected product. Each GRD scene has one of three resolutions (10, 25, or 40 meters), four band combinations, and three instrument modes. Each scene was processed using thermal noise removal, radiometric calibration, and terrain correction using SRTM 30 or ASTER DEM for areas greater than 60 degrees latitude, where SRTM is not available. The final terrain-corrected values are converted to decibels via log scaling.

#### Climate data

HRRR: The High-Resolution Rapid Refresh (HRRR) is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar assimilation.

• Approved data access locations:
• https://noaahrrr.blob.core.windows.net/hrrr
• s3://noaa-hrrr-bdp-pds/
• Planetary Computer resources:
• AWS resources:
• Data product details

#### Digital surface data

Tip: when there are multiple access point locations for a data source, download from the one that is geographically closest to you to reduce download times.

Copernicus DEM (90 meter resolution): The Copernicus Digital Elevation Model (DEM) is a digital surface model (DSM), which represents the surface of the Earth including buildings, infrastructure, and vegetation. This DSM is based on radar satellite data acquired during the TanDEM-X Mission.

Climate Research Data Package (CRDP) Land Cover Gridded Map (300 meter resolution): The CRDP land cover map (2020) classifies land surface into 22 classes, which have been defined using the United Nations Food and Agriculture Organization's Land Cover Classification System (LCCS). This map is based on data from the Medium Resolution Imaging Spectrometer (MERIS) sensor on board the polar-orbiting Envisat-1 environmental research satellite by the European Space Agency. This data comes from the CCI-LC database hosted by the ESA Climate Change Initiative's Land Cover project.

• Approved data access locations:
• s3://drivendata-public-assets/land_cover_map.tar.gz (US)
• s3://drivendata-public-assets-eu/land_cover_map.tar.gz (Europe)
• s3://drivendata-public-assets-asia/land_cover_map.tar.gz (Asia)

CRDP Water Bodies Map (150 meter resolution): The CRDP water bodies map (2000) classifies areas into land and water. This map is based on the Envisat ASAR water bodies indicator from the Global Forest Change and the Global Inland Water data products.

• Approved data access locations:
• s3://drivendata-public-assets/water_bodies_map.tar.gz (US)
• s3://drivendata-public-assets-eu/water_bodies_map.tar.gz (Europe)
• s3://drivendata-public-assets-asia/water_bodies_map.tar.gz (Asia)

CRDP Burned Areas Occurrence Map (500 meter resolution): The CRDP burned areas occurrence map presents the percentage of burned areas occurrence as detected over the 2000-2012 period on a 7-day basis. Data originate from the GFEDv3 dataset. This data product is composed of two series of 52 layers (one per week).

• Approved data access locations:
• s3://drivendata-public-assets/burned_areas_occurrence_map.tar.gz (US)
• s3://drivendata-public-assets-eu/burned_areas_occurrence_map.tar.gz (Europe)
• s3://drivendata-public-assets-asia/burned_areas_occurrence_map.tar.gz (Asia)

FAO-UNESCO Global Soil Regions Map: The global soil regions map (2005) shows the global distribution of the 12 soil orders according to the Soil Taxonomy. This map is based on a reclassification of the FAO-UNESCO Soil Map of the World combined with a soil climate map.

• Approved data access locations:
• s3://drivendata-public-assets/soil_regions_map.tar.gz (US)
• s3://drivendata-public-assets-eu/soil_regions_map.tar.gz (Europe)
• s3://drivendata-public-assets-asia/soil_regions_map.tar.gz (Asia)

Additional data sources may be explored and incorporated during model training. However, only pre-approved data sources are allowed for generating predictions during real-time evaluation.

If you would like for any additional sources to be approved for use during inference, you are welcome to submit an official request form and the challenge organizers will review the request. Only select sources that demonstrate a strong case for use will be considered.

To qualify for approval, data sources must meet the following minimum requirements:

• Freely and publicly available to all participants
• Produced reliably by an operational data product
• Do not incorporate SNOTEL or Airborne Snow Observatory (ASO) data
• Provides clear value beyond existing approved sources

Any requests to add approved data sources must be received by January 18 to be considered. An announcement will be made to all challenge participants if your data source has been approved for use.

### Labels (outputs)

The labels are the SWE measurements in inches for each grid cell collected over a set period of time. Labels for 2013-2019 were provided in Stage 1. For the Model Evaluation Stage (Stage 2), we provide labels_2020_2021.csv, which contains columns for:

• cell_id (type: str): grid cell ID
• YYYY-MM-DD (type: str): each collection date (UTC)

Training labels are provided for December 1 to June 30 each year from 2020 to 2021. Labels are derived from a combination of ground-based snow telemetry (ie. SNOTEL and CDEC sites) and airborne measurements (ie. ASO data). Additional training labels from 2013-2019 are available through the Development Stage.

Keep in mind that SWE is a cumulative process, meaning temporal (serial) correlation is expected.

grid_cells.geojson contains the coordinates and region information for all grid cells in the submission_format.csv. It has the following keys:

• crs: Coordinate Reference System (CRS) for grid cells (value of EPSG:4326/WGS84 CRS)
• features: Contains cell_id, region, and geometry for each grid cell. Region options include "sierras", "central rockies", and "other" and will be used to calculate regional prizes.

## Submission format

The format for the submission file is a .csv with columns for cell_id and each weekly collection date for which SWE predictions are needed. The time range includes January - June 2022. Predictions for swe must be floating point number values. For example, 1.0 is a valid float, but 1 is not.

For the Model Evaluation Stage, your score will be based on the accumulated predictions over the course of the evaluation period. Here's how it will work:

• For each day requiring estimation as indicated in the submission format, submissions may be made up to three days following the start of that day. For example, if you are submitting predictions for 2022-01-13, you may submit your predictions up to 23:59 UTC on 2022-01-15. As always, submissions must only use approved data collected up through the day the estimation applies to. Use of any future data after the day of estimation is prohibited.
• For future dates, you may keep predictions as missing data (NaN) until you are ready to submit predictions for that week. You may submit predictions for future dates if you choose to. Any predictions for past dates beyond three days will be ignored.
• If you miss a week, then that week will be automatically populated in the following order:
• Using the most recent prediction for that week, if you predicted for future dates
• Otherwise copying the previous week's prediction for that grid cell
• If there is no previous prediction for that grid cell, using a no-snow estimate (0.0)
• During the practice period in Stage 2a, the score will be based on performance for predictions up until Feb 15. During Stage 2b, these predictions will be ignored and only predictions after Feb 15 will be considered.

Illustration of how a sample submission for the week of 2022-01-27 is merged into your accumulated predictions used in scoring. Frozen columns from the submission are ignored, while editable columns are used to update your eval predictions. All rows for the current week must contain a predicted value.

Scoring note: For this stage, keep in mind that scores will not be updated as submissions are entered, but rather when ground truth values are brought into the platform for evaluation. This will always happen after submissions close for a given week. The latest scores will be displayed on the leaderboard.

#### Example

For example, if you predicted a swe of 1.0 inch for the first five cells for the first week, your predictions would look like the following:

cell_id 2022-01-13 2022-01-20 ... 2022-06-30
00017271-ab2c-4d55-96e3-7dbc601dcefa 1.0 ...
00088eed-4be5-48e1-9af1-1d2c50bdca01 1.0 ...
000dc53c-c3c5-4d10-8598-8ca378082526 1.0 ...
001b39fd-608f-4c2d-8368-0d650ca05e3b 1.0 ...
001f39cd-64e8-4f1e-84d1-95f67041a6ee 1.0 ...

And the first few rows and columns of the .csv file that you submit would look like:

cell_id,2022-01-13,2022-01-20
00017271-ab2c-4d55-96e3-7dbc601dcefa,1.0,,
00088eed-4be5-48e1-9af1-1d2c50bdca01,1.0,,
000dc53c-c3c5-4d10-8598-8ca378082526,1.0,,
001b39fd-608f-4c2d-8368-0d650ca05e3b,1.0,,
001f39cd-64e8-4f1e-84d1-95f67041a6ee,1.0,,


You can see an example of the format that your submission must conform to, including headers and row names, in submission_format.csv.

In this challenge, predictions for a given grid cell may use predictions made for other grid cells if desired (i.e., it is not necessary for each test sample to be processed independently without the use of information from other cases in the test set).

## Code submission

As noted above, you will not be allowed to re-train your model when participating in Stage 2b. You will simply be applying your existing model onto a new data set.

In order to be eligible for final prizes, you will need to submit a code directory as a compressed zip file with your source code and model weights prior to February 15, 2022 (the end of Stage 2a). The zip file must contain the following:

• All source code used to produce a submission, structured logically with an obvious point of entry. We recommend using a structure similar to our open source data science template for ease of sharing. This includes:
• Training code used to produce the trained model
• Inference code used to generate a submission for a given day and grid cell, using the provided model weights and specified data sources from approved access locations
• Model weights from the trained model, which are used to generate predictions using the inference code above.
• Extremely clear README with obvious instructions, dependencies and requirements identified, and clearly specifying what code needs to run to get to your submission from a fresh system with no dependencies. This includes:
• All requirements needed for running submitted code, for instance in a requirements.txt file with versions specified or an environment.yml file
• Data sources used in training and inference, including clear description of where and how they are used
• Instructions for training the model from scratch using the training source code above
• Instructions for running inference, using the inference source code and model weights above

Instructions should specify the series of commands, in order, that would get a reasonably savvy user from your code to a trained model or finished submission. Ideally, you will have a main point of entry to your code such as an executable script that runs all steps of the pipeline in a deterministic fashion. A well-contructed Jupyter Notebook or R script also meets this standard.

Following this submission, if any additional code changes are needed during the live evaluation period to ensure successful weekly submissions, these will need to be tracked clearly. Again, no changes may be made to training code or model weights. In order to be eligible for prizes, prize finalists will be required to submit a final code repository which includes clear tracking (code diff) of any changes from their initial code submission and the weekly submissions they apply to.

Submit a link to your zip file on the code submission page.

## Performance metric

To measure your model’s performance, we’ll use a metric called Root Mean Square Error (RMSE), which is a measure of accuracy and quantifies differences between estimated and observed values. RMSE is the square root of the mean of the squared differences between the predicted values and the actual values. This is an error metric, so a lower value is better.

RMSE is defined as:

$$RMSE = \sqrt{\frac{1}{N} \sum_{i=0}^N (y_i - \hat{y_i})^2 }$$

where:

• $\hat{y_i}$ is the $i$th predicted value
• $y_i$ is the $i$th true value
• $N$ is the number of samples

This metric is implemented in scikit-learn, with the squared parameter set to False.

For each submission, a secondary metric called the coefficient of determination R2 will also be reported on the leaderboard for added interpretability. R2 indicates the proportion of the variation in the dependent variable that is predictable from the independent variables. This is an accuracy metric, so a higher value is better.

$$R^2 = 1 - \frac{\textrm{residual sum of squared errors}}{\text{total sum of squared errors}} = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y_i})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

• $n$ = number of values in the dataset
• $y_i$ = $i$th true value
• $\hat{y_i}$ = $i$th predicted value
• $\bar{y}$ = average of all true $y$ values

While both RMSE and R2 will be reported, only RMSE will be used to determine your official ranking and prize eligibility.

## Good luck!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!