Water Supply Forecast Rodeo: Development Arena

START HERE! Water managers in the Western U.S. rely on accurate water supply forecasts to better operate facilities and mitigate drought. Help the Bureau of Reclamation improve seasonal water supply estimates in this probabilistic forecasting challenge! #climate

Development Arena
dec 2023
547 joined

Approved data sources

Last updated: December 14, 2023

This page documents all approved data sources for use as feature data (i.e., predictor or input data) for your forecast models. Valid submissions must use only approved data sources as features. If you need additional data from an already approved source (e.g., from a longer time range), or if you would like to use data from another source, please refer to the section on requesting approval for additional data for more details.

As a reminder, for code execution in the Hindcast and Forecast Stage evaluations, you will be required to use copies of the feature data available within the runtime environment unless otherwise specified. For each data source, the following will be documented:

  • Approved data source — specific source that approved data will be downloaded from
  • Hindcast data available — details about what specific parameters of data will be available in code execution, if applicable.
  • Direct API access permitted — only if direct API access to pull data during test set inference is permitted for this data source
  • Data download code — the code used by DrivenData to download data for the code execution runtime.
  • Sample data reading code — sample code that can be used to load the data. This will be made available as part of an installed package wsfr-read within the runtime environment.

Antecedent streamflow

NRCS and RFCs monthly naturalized flow

Description
Naturalized flow at the forecast sites from NRCS and the RFCs. These are the past monthly time series observations for the forecast target variable. See the problem description page for additional discussion.
Approved data source
See test_monthly_naturalized_flow.csv on the data download page.
Hindcast data available
Preceding October 1 through May or June for each forecast year in Hindcast test set, depending on the site's forecast season. Monthly data is only available for 23 of the 26 forecast sites.

USGS streamflow

Description
Daily observed streamflow measurements from the U.S. Geological Survey (USGS) recorded at USGS streamgages. These measurements represent actual observed flow of water at specific locations, and not the naturalized flow being forecasted. Solutions should not be attempting to model the adjustments for calculating naturalized flow, as these are impacted by water management operations that may in general be influenced by forecasts. However, observed streamflow at the forecast sites or at other locations may still be useful predictors of the overall drainage basin condition. The runtime environment will include daily measurements for the specific USGS streamgages located at the forecast sites (available for 25 of the 26 sites, see provided metadata.csv), but you are permitted to use data from any other location by directly accessing the USGS Water Services APIs. View additional details via the USGS Water Services API documentation.
Approved data source
USGS Water Services: (https://waterservices.usgs.gov/nwis)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set for gages at 25 of the 26 forecast sites.
Data download code (for gages at forecast sites)
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Direct API access permitted
For other locations available from USGS, you are permitted to directly query the USGS Water Services APIs for data from other locations.
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

USBR reservoir inflow

Description
Metered or calculated data on inflow into U.S. Bureau of Reclamation (USBR) reservoirs. These inflow measurements represent actual observed flow of water, and not the naturalized flow being forecasted. Solutions should not be attempting to model the adjustments for calculating naturalized flow, as these are impacted by water management operations that may in general be influenced by forecasts. However, observed flow at some locations may still be useful predictors of the overall drainage basin condition. View additional details via the the USBR RISE API documentation.
Approved data source
USBR RISE API: (https://data.usbr.gov/rise/api)
Direct API access permitted
You are permitted to directly query reservoir inflow measurements from the USBR RISE API

Snowpack

NRCS SNOTEL

Description
The Snow Telemetry (SNOTEL) network is composed of over 900 automated data collection sites located in remote, high-elevation mountain watersheds in the Western U.S. They are used to monitor snowpack, precipitation, temperature, and other climatic conditions. You can read more about the SNOTEL network here. You can read more about the NRCS Air-Water Database (AWDB) Web Service here.
Approved data source
NRCS AWDB Web Service: SOAP (https://wcc.sc.egov.usda.gov/awdbWebService/services?WSDL), REST (https://wcc.sc.egov.usda.gov/awdbRestApi/services)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set for stations with 40 miles of the forecast site drainage basins
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Direct API access permitted
For other SNOTEL stations whose data is not downloaded, you are permitted to directly query the NRCS AWDB APIs for data.

CDEC Snow Sensor Network

Description
The California Data Exchange Center (CDEC) facilitates the collection, storage, and exchange of hydrologic and climate information to support real-time flood management and water supply needs in California. CDEC operates snowpack monitoring stations similar to SNOTEL within California. Station metadata for snow monitoring stations is available on the data download page (cdec_snow_stations.csv).
Approved data source
CDEC APIs (https://cdec.water.ca.gov/)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set for stations with 40 miles of the forecast site drainage basins
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime (requires cdec_snow_stations.csv from data download page)
Direct API access permitted
For other CDEC stations whose data is not downloaded, you are permitted to directly query the CDEC APIs for data.

SNODAS

Description
The SNOw Data Assimilation System (SNODAS) estimates snow cover, snow water equivalent, and other snow parameters to support hydrologic modeling. It integrates observations from satellite and airborn platforms and from ground stations with physics models. This is a daily 1 km by 1 km data product that covers the contiguous U.S. You can read more about SNODAS here.
Approved data source
National Snow and Ice Data Center (NSIDC) (masked)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

UA/SWANN

Description
The Snow Water Artificial Neural Network (SWANN) system developed at the University of Arizona produces a gridded snow water equivalent data product by assimilating SNOTEL ground-based observations with temperature and precipitation data. This daily data source is available as two products: 4 km gridded data over the contiguous United States, and spatially averaged over USGS HUC regions.
Approved data source
University of Arizona: 4 km gridded data (https://climate.arizona.edu/data/UA_SWE/); HUC spatially averaged data (https://climate.arizona.edu/snowview/csv/Download/Watersheds/)
Direct API access permitted
You are permitted to directly download data from the University of Arizona file servers

MODIS Snow Cover

Description
The Moderate Resolution Imaging Spectroradiometer (MODIS) is an instrument on the Terra and Aqua spacecraft. Snow cover gridded data products with 500-meter resolution are available as daily composites (MOD10A1-061, MYD10A1-061) and 8-day composites (MOD10A2-061, MYD10A2-061). For more information, see the product catalog entries (daily; 8-day) and example notebooks (daily; 8-day) from the Microsoft Planetary Computer.
Approved source
Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-10A1-061; https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-10A2-061)
Direct API access permitted
You are permitted to directly download data via the Planetary Computer API

Weather and climate products

Observed and forecasted weather and climate data products can provide relevant information on the environmental conditions that affect streamflow.

RCC-ACIS

Description
The Applied Climate Information System (ACIS), maintained by NOAA Regional Climate Centers (RCCs), provides access to historical and near real-time climate observation data from a variety of sources in order to support operational users with one high quality system. Climate data products like daily temperature and precipitation observations may be especially relevant to water supply forecasting. You can read more about the ACIS Web Services APIs here.
Approved data source
ACIS Web Services (https://data.rcc-acis.org)
Direct API access permitted
You are permitted to directly query any API endpoint from ACIS Web Services

CPC Seasonal Outlooks

Description
The Climate Prediction Center (CPC), a part of the National Weather Service, issues seasonal temperature and precipitation forecasts with up to 13 months lead time. These forecasts are issued for 102 geographical regions called "climate divisions" (also called "forecast divisions") defined by the CPC. You can read more about the forecasts here. Geospatial vector data for the climate divisions is available on the data download page (cpc_climate_divisions.gpkg) and can be joined to downloaded data on the CD identifier column.
Approved data source
CPC Outlook Archive
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Seasonal meteorological forecasts from Copernicus

Description
A multi-system seasonal forecast service that integrates global seasonal (long-range) gridded forecast products from several Europrean forecast centers. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here. Read more about the CDS API and cdsapi Python client here.
Approved data source
Copernicus Climate Date Store
Direct API access permitted
You are permitted to directly query data needed using the cdsapi Python client

Seasonal fire danger indices forecasts from CEMS

Description
Long-range forecasts of global daily fire danger produced by the Copernicus Emergency Management Service (CEMS) using the Global ECMWF Fire Forecast (GEFF) model. Daily forecasts are made with a lead time of up to 216 days (approximately 7 months). This dataset includes many different fire danger indices used by different countries. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here. Read more about the CDS API and cdsapi Python client here.
Approved data source
Copernicus Climate Date Store
Direct API access permitted
You are permitted to directly query data needed using the cdsapi Python client

ERA5-Land and ERA5-Land-T reanalysis

Updated on December 13, 2023 to include monthly averaged data as approved.

Description
ERA5-Land is a global reanalysis dataset of land variables. This data source includes both ERA5-Land, which is published with a 2–3 month lag, and ERA5-Land-T, which is a non-checked version published in near-real-time. This data is available at hourly resolution and as monthly averages. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here: hourly, monthly averaged. Read more about the CDS API and cdsapi Python client here.
Approved data source
Copernicus Climate Date Store: hourly, monthly averaged
Direct API access permitted
You are permitted to directly query data needed using the cdsapi Python client

NLDAS-2 forcing data

Description
North American Land Data Assimilation System (NLDAS) uses numerical physics models integrated with ground- and space-based observing systems to produce fields of water and energy states and fluxes. The forcing data includes a variety of meteorological variables that are inputs to this model, such as precipitation, wind speed, average air temperature, incoming radiation, and surface pressure. You can read more about the NLDAS-2 forcing dataset here. You can view the datasets from the Goddard Earth Sciences Data and Information Services Center (GES DISC) here. A Python client PyNLDAS2 is available.
Approved data source
GES DISC (https://hydro1.gesdisc.eosdis.nasa.gov/)
Direct API access permitted
You are permitted to directly download data from GES DISC

NCEP/NCAR Reanalysis 1

Description
The NCEP/NCAR Reanalysis 1 is a reanalysis dataset. It is a gridded data product of atmospheric and land variables produced by data assimilation of numerical weather models with observational climate data. You can read more about the dataset here.
Approved data source
NOAA PSL Downloads Server (https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis) or THREDDS Server (https://psl.noaa.gov/thredds/catalog/Datasets/ncep.reanalysis)
Direct API access permitted
You are permitted to directly download data from NOAA PSL

USGS SSEBop Evapotranspiration

Description
Evapotranspiration (ET) is the movement of water into the atmosphere that combines evaporation and transpiration. USGS provides ET data products based on remote sensing data and an operational Simplified Surface Energy Balance (SSEBop) model. The v5 data product uses MODIS thermal imagery and covers 2003–2022, while the v6 data product uses VIIRS thermal imagery and covers 2012 to present. Since neither of the versions cover the full period of the challenge, you will need to use both. Note that you will need to adjust the v5 MODIS values to be comparable to the v6 VIIRS values, as they are derived from different remote sensing data and are not the same. Please document your adjustment methodology in the model report. Both versions are available as dekadal (10-day) and monthly products. You can read more about these products from USGS (v5 MODIS; v6 VIIRS).
Approved data source
USGS FEWS Net file servers (https://edcintl.cr.usgs.gov): v5 MODIS dekadal, v5 MODIS monthly, v6 VIIRS dekadal, v6 VIIRS monthly
Direct API access permitted
You are permitted to directly download data from USGS FEWS Net file servers

Drought and moisture conditions

Palmer Drought Severity Index (PDSI) from gridMET

Description
The Palmer Drought Severity Index (PDSI) is a measure of drought based on a soil moisture model applied to precipitation and temperature data. This particular data source is a gridded pentad (every 5 days) PDSI product produced from gridMET meteorological data and USDA STATSGO soil data. You can read more about this PDSI data product here.
Approved data source
University of Idaho Northwest Knowledge Network (NetCDF format—see "NetcdfSubset")
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

GRACE-based Soil Moisture and Groundwater Drought Indicators

Soil moisture and groundwater drought indicators derived from GRACE-FO satellite data. GRACE-FO is a satellite mission that maps the Earth's gravitational field, a measurement of spatial mass concentration. This data product incorporates GRACE-FO observations with other data and a numerical model. There are three indicators: surface soil moisture (top 2 cm of soil), root zone soil moisture (top 1 m of soil), and shallow groundwater. You can read more about the data here.

Approved data source
National Drought Mitigation Center (CONUS, NetCDF4 format)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Climate teleconnection indices

Teleconnection refers to climate patterns or anomalies in one region of the world that are correlated with and often influence weather patterns in distant parts of the globe.

Oceanic Niño Index (ONI)

Description
3-month running average of sea surface temperature anomalies in the Niño 3.4 region. This measure is used as an indicator of the El Niño–Southern Oscillation phenomenon. Warm (El Niño) and cold (La Niña) phases are defined as a minimum of five consecutive ONI values surpassing a threshold of +/- 0.5°C. You can read more about the ONI here.
Approved data source
National Centers for Environmental Information (NCEI)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Niño Regions Sea Surface Temperatures

Description
Monthly sea surface temperature anomalies in Niño regions. These are additional measures related to the El Niño–Southern Oscillation phenomenon. You can read more about these measures here.
Approved source
National Centers for Environmental Information (NCEI)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Southern Oscillation Index (SOI)

Description
Standardized sea level pressure differences between Tahiti and Darwin, Australia. This measure is used as an indicator of the El Niño–Southern Oscillation phenomenon. The index is negative when there is below-normal air pressure at Tahiti and above-normal air pressure at Darwin, and vice versa when the index is positive. Periods of negative values coincide with El Niño and positive values coincide with La Niña. You can read more about the SOI here.
Approved source
National Centers for Environmental Information (NCEI)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Madden-Julian Oscillation (MJO) Pentad Indices

Description
The Madden-Julian Oscillation is an eastward moving weather pattern with a typical period of 30 to 60 days. The pentad indices are normalized projections of pentad velocity potential on patterns from extended empirical orthogonal function analysis on historical reference data from 1979 to 2000. You can read more about the Madden-Julian Oscillation from Climate.gov, and the methodology for the indices from the CPC.
Approved source
Climate Prediction Center (CPC)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Pacific North American (PNA) Index

Description
The Pacific North American (PNA) pattern is a large-scale weather in the atmospheric circulation over the Pacific Ocean and North America. The index is the projection of the air pressure field on a particular mode from empirical orthogonal function analysis of reference data from 1950 to 2000. You can read more about the PNA pattern from Climate.gov and from NCEI.
Approved source
Climate Prediction Center (CPC)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Pacific Decadal Oscillation (PDO) Index

Description
The Pacific Decadal Oscillation (PDO) is a climate pattern of the Pacific Ocean that is characterized by warm and cool phases in sea surface temperature. It is similar to the El Niño–Southern Oscillation but has a longer time scale, with phases that can persist for 20 to 30 years. The index is calculated from projecting sea surface temperatures on the first principal component of reference data from 1900 to 1993. You can read more about the PDO index from JISAO or from NCEI.
Approved source
National Centers for Environmental Information (NCEI)
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Sample data reading code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime

Vegetation conditions

MODIS Vegetation Indices

Description
The Moderate Resolution Imaging Spectroradiometer (MODIS) is an instrument on the Terra and Aqua spacecraft. The Vegetation Indices 16-day data product includes global Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) measures of vegetation. For more information, see this product's catalog entry and example notebook from the Microsoft Planetary Computer.
Approved source
Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-13A1-061)
Hindcast data available
Preceding October 1 through July 21 for each forecast year in Hindcast test set for STAC items that spatially intersect with the forecast site drainage basins
Data download code
Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Direct API access permitted
For additional items, you are permitted to directly download via the Planetary Computer API

Land and Elevation

Copernicus DEM GLO-90

Description
The Copernicus Digital Elevation Model (DEM) is an elevation dataset that represents the surface of the Earth, including buildings, infrastructure, and vegetation. The data comes from the TanDEM-X mission. The GLO-90 data product has a horizontal resolution of approximately 90 meters. For more information, see this product's catalog entry and example notebook from the Microsoft Planetary Computer.
Approved source
Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/cop-dem-glo-90)
Direct API access permitted
You are permitted to directly download data via the Planetary Computer API

National Land Cover Database (NLCD) Urban Imperviousness

Description
The National Land Cover Database (NLCD) is a set of data products on land cover and land cover change for the contiguous United States published by USGS and MRLC. Urban imperviousness refers to surfaces which are water resistant. The 2021 NLCD urban imperviousness product is approved for use in the challenge. You can read more about this product here. Important: NLCD is an update-based data product with releases corresponding to the year of the source imagery: 2001, 2006, 2011, 2016, 2019, and 2021. NLCD releases take many years to prepare, and the actual release date is typically several years later than the source imagery year. For example, the 2011 product was not available until 2014-03-31. In order to reflect operational conditions when performing inference, you must use only the latest epoch available based on release date. For example, if making predictions for the 2013-01-01 issue date, the latest release available at that time was the 2006 product (released on 2011-02-16), and so the latest available epoch would have been 2006. A CSV file nlcd_release_dates.csv containing release dates for each version is available on the data download page.
Approved source
MRLC (19.54 GB, ZIP archive)
Hindcast data available
The 2021 NLCD imperviousness ZIP archive (NLCD_impervious_2021_release_all_files_20230630.zip) and companion nlcd_release_dates.csv are directly available in the mounted data drive

BasinATLAS Basin Attributes

Description
The BasinATLAS dataset, a part of the HydroSHEDS database, is a collection of hydrological, physiographic, climate, land cover, geological, and anthropogenic variables describing sub-basins globally. Version 10 of the BasinATLAS dataset is approved for use in the challenge. You can read more about BasinATLAS here and see the detailed catalog of variables here. Important: BasinATLAS variables are derived from a diverse set of source datasets each with different time coverage. For any BasinATLAS variables used, you should clearly document in your model report the source data provenance and justify why the variable does not leverage future data in an unrealistic way for operational use or leak information about test set.
Approved source
HydroSHEDS via Figshare (2.7 GB, geodatabase format)
Hindcast data available
BasinATLAS v10 compressed geodatabase (BasinATLAS_Data_v10.gdb.zip) is directly available in the mounted data drive

Don't see a data source you want to use? Please see the documentation on requesting approval for additional data.