Approved data sources
Last updated: December 14, 2023
This page documents all approved data sources for use as feature data (i.e., predictor or input data) for your forecast models. Valid submissions must use only approved data sources as features. If you need additional data from an already approved source (e.g., from a longer time range), or if you would like to use data from another source, please refer to the section on requesting approval for additional data for more details.
As a reminder, for code execution in the Hindcast and Forecast Stage evaluations, you will be required to use copies of the feature data available within the runtime environment unless otherwise specified. For each data source, the following will be documented:
- Approved data source — specific source that approved data will be downloaded from
- Hindcast data available — details about what specific parameters of data will be available in code execution, if applicable.
- Direct API access permitted — only if direct API access to pull data during test set inference is permitted for this data source
- Data download code — the code used by DrivenData to download data for the code execution runtime.
- Sample data reading code — sample code that can be used to load the data. This will be made available as part of an installed package
wsfr-read
within the runtime environment.
Antecedent streamflow
NRCS and RFCs monthly naturalized flow
- Description
- Naturalized flow at the forecast sites from NRCS and the RFCs. These are the past monthly time series observations for the forecast target variable. See the problem description page for additional discussion.
- Approved data source
- See
test_monthly_naturalized_flow.csv
on the data download page. - Hindcast data available
- Preceding October 1 through May or June for each forecast year in Hindcast test set, depending on the site's forecast season. Monthly data is only available for 23 of the 26 forecast sites.
USGS streamflow
- Description
- Daily observed streamflow measurements from the U.S. Geological Survey (USGS) recorded at USGS streamgages. These measurements represent actual observed flow of water at specific locations, and not the naturalized flow being forecasted. Solutions should not be attempting to model the adjustments for calculating naturalized flow, as these are impacted by water management operations that may in general be influenced by forecasts. However, observed streamflow at the forecast sites or at other locations may still be useful predictors of the overall drainage basin condition. The runtime environment will include daily measurements for the specific USGS streamgages located at the forecast sites (available for 25 of the 26 sites, see provided
metadata.csv
), but you are permitted to use data from any other location by directly accessing the USGS Water Services APIs. View additional details via the USGS Water Services API documentation. - Approved data source
- USGS Water Services: (https://waterservices.usgs.gov/nwis)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set for gages at 25 of the 26 forecast sites.
- Data download code (for gages at forecast sites)
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Direct API access permitted
- For other locations available from USGS, you are permitted to directly query the USGS Water Services APIs for data from other locations.
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
USBR reservoir inflow
- Description
- Metered or calculated data on inflow into U.S. Bureau of Reclamation (USBR) reservoirs. These inflow measurements represent actual observed flow of water, and not the naturalized flow being forecasted. Solutions should not be attempting to model the adjustments for calculating naturalized flow, as these are impacted by water management operations that may in general be influenced by forecasts. However, observed flow at some locations may still be useful predictors of the overall drainage basin condition. View additional details via the the USBR RISE API documentation.
- Approved data source
- USBR RISE API: (https://data.usbr.gov/rise/api)
- Direct API access permitted
- You are permitted to directly query reservoir inflow measurements from the USBR RISE API
Snowpack
NRCS SNOTEL
- Description
- The Snow Telemetry (SNOTEL) network is composed of over 900 automated data collection sites located in remote, high-elevation mountain watersheds in the Western U.S. They are used to monitor snowpack, precipitation, temperature, and other climatic conditions. You can read more about the SNOTEL network here. You can read more about the NRCS Air-Water Database (AWDB) Web Service here.
- Approved data source
- NRCS AWDB Web Service: SOAP (https://wcc.sc.egov.usda.gov/awdbWebService/services?WSDL), REST (https://wcc.sc.egov.usda.gov/awdbRestApi/services)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set for stations with 40 miles of the forecast site drainage basins
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Direct API access permitted
- For other SNOTEL stations whose data is not downloaded, you are permitted to directly query the NRCS AWDB APIs for data.
CDEC Snow Sensor Network
- Description
- The California Data Exchange Center (CDEC) facilitates the collection, storage, and exchange of hydrologic and climate information to support real-time flood management and water supply needs in California. CDEC operates snowpack monitoring stations similar to SNOTEL within California. Station metadata for snow monitoring stations is available on the data download page (
cdec_snow_stations.csv
). - Approved data source
- CDEC APIs (https://cdec.water.ca.gov/)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set for stations with 40 miles of the forecast site drainage basins
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime (requires
cdec_snow_stations.csv
from data download page) - Direct API access permitted
- For other CDEC stations whose data is not downloaded, you are permitted to directly query the CDEC APIs for data.
SNODAS
- Description
- The SNOw Data Assimilation System (SNODAS) estimates snow cover, snow water equivalent, and other snow parameters to support hydrologic modeling. It integrates observations from satellite and airborn platforms and from ground stations with physics models. This is a daily 1 km by 1 km data product that covers the contiguous U.S. You can read more about SNODAS here.
- Approved data source
- National Snow and Ice Data Center (NSIDC) (masked)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
UA/SWANN
- Description
- The Snow Water Artificial Neural Network (SWANN) system developed at the University of Arizona produces a gridded snow water equivalent data product by assimilating SNOTEL ground-based observations with temperature and precipitation data. This daily data source is available as two products: 4 km gridded data over the contiguous United States, and spatially averaged over USGS HUC regions.
- Approved data source
- University of Arizona: 4 km gridded data (https://climate.arizona.edu/data/UA_SWE/); HUC spatially averaged data (https://climate.arizona.edu/snowview/csv/Download/Watersheds/)
- Direct API access permitted
- You are permitted to directly download data from the University of Arizona file servers
MODIS Snow Cover
- Description
- The Moderate Resolution Imaging Spectroradiometer (MODIS) is an instrument on the Terra and Aqua spacecraft. Snow cover gridded data products with 500-meter resolution are available as daily composites (MOD10A1-061, MYD10A1-061) and 8-day composites (MOD10A2-061, MYD10A2-061). For more information, see the product catalog entries (daily; 8-day) and example notebooks (daily; 8-day) from the Microsoft Planetary Computer.
- Approved source
- Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-10A1-061; https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-10A2-061)
- Direct API access permitted
- You are permitted to directly download data via the Planetary Computer API
Weather and climate products
Observed and forecasted weather and climate data products can provide relevant information on the environmental conditions that affect streamflow.
RCC-ACIS
- Description
- The Applied Climate Information System (ACIS), maintained by NOAA Regional Climate Centers (RCCs), provides access to historical and near real-time climate observation data from a variety of sources in order to support operational users with one high quality system. Climate data products like daily temperature and precipitation observations may be especially relevant to water supply forecasting. You can read more about the ACIS Web Services APIs here.
- Approved data source
- ACIS Web Services (https://data.rcc-acis.org)
- Direct API access permitted
- You are permitted to directly query any API endpoint from ACIS Web Services
CPC Seasonal Outlooks
- Description
- The Climate Prediction Center (CPC), a part of the National Weather Service, issues seasonal temperature and precipitation forecasts with up to 13 months lead time. These forecasts are issued for 102 geographical regions called "climate divisions" (also called "forecast divisions") defined by the CPC. You can read more about the forecasts here. Geospatial vector data for the climate divisions is available on the data download page (
cpc_climate_divisions.gpkg
) and can be joined to downloaded data on theCD
identifier column. - Approved data source
- CPC Outlook Archive
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Seasonal meteorological forecasts from Copernicus
- Description
- A multi-system seasonal forecast service that integrates global seasonal (long-range) gridded forecast products from several Europrean forecast centers. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here. Read more about the CDS API and cdsapi Python client here.
- Approved data source
- Copernicus Climate Date Store
- Direct API access permitted
- You are permitted to directly query data needed using the
cdsapi
Python client
Seasonal fire danger indices forecasts from CEMS
- Description
- Long-range forecasts of global daily fire danger produced by the Copernicus Emergency Management Service (CEMS) using the Global ECMWF Fire Forecast (GEFF) model. Daily forecasts are made with a lead time of up to 216 days (approximately 7 months). This dataset includes many different fire danger indices used by different countries. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here. Read more about the CDS API and cdsapi Python client here.
- Approved data source
- Copernicus Climate Date Store
- Direct API access permitted
- You are permitted to directly query data needed using the
cdsapi
Python client
ERA5-Land and ERA5-Land-T reanalysis
Updated on December 13, 2023 to include monthly averaged data as approved.
- Description
- ERA5-Land is a global reanalysis dataset of land variables. This data source includes both ERA5-Land, which is published with a 2–3 month lag, and ERA5-Land-T, which is a non-checked version published in near-real-time. This data is available at hourly resolution and as monthly averages. An overview of this dataset and additional documentation from the Copernicus Climate Change Service is available here: hourly, monthly averaged. Read more about the CDS API and cdsapi Python client here.
- Approved data source
- Copernicus Climate Date Store: hourly, monthly averaged
- Direct API access permitted
- You are permitted to directly query data needed using the
cdsapi
Python client
NLDAS-2 forcing data
- Description
- North American Land Data Assimilation System (NLDAS) uses numerical physics models integrated with ground- and space-based observing systems to produce fields of water and energy states and fluxes. The forcing data includes a variety of meteorological variables that are inputs to this model, such as precipitation, wind speed, average air temperature, incoming radiation, and surface pressure. You can read more about the NLDAS-2 forcing dataset here. You can view the datasets from the Goddard Earth Sciences Data and Information Services Center (GES DISC) here. A Python client PyNLDAS2 is available.
- Approved data source
- GES DISC (https://hydro1.gesdisc.eosdis.nasa.gov/)
- Direct API access permitted
- You are permitted to directly download data from GES DISC
NCEP/NCAR Reanalysis 1
- Description
- The NCEP/NCAR Reanalysis 1 is a reanalysis dataset. It is a gridded data product of atmospheric and land variables produced by data assimilation of numerical weather models with observational climate data. You can read more about the dataset here.
- Approved data source
- NOAA PSL Downloads Server (https://downloads.psl.noaa.gov/Datasets/ncep.reanalysis) or THREDDS Server (https://psl.noaa.gov/thredds/catalog/Datasets/ncep.reanalysis)
- Direct API access permitted
- You are permitted to directly download data from NOAA PSL
USGS SSEBop Evapotranspiration
- Description
- Evapotranspiration (ET) is the movement of water into the atmosphere that combines evaporation and transpiration. USGS provides ET data products based on remote sensing data and an operational Simplified Surface Energy Balance (SSEBop) model. The v5 data product uses MODIS thermal imagery and covers 2003–2022, while the v6 data product uses VIIRS thermal imagery and covers 2012 to present. Since neither of the versions cover the full period of the challenge, you will need to use both. Note that you will need to adjust the v5 MODIS values to be comparable to the v6 VIIRS values, as they are derived from different remote sensing data and are not the same. Please document your adjustment methodology in the model report. Both versions are available as dekadal (10-day) and monthly products. You can read more about these products from USGS (v5 MODIS; v6 VIIRS).
- Approved data source
- USGS FEWS Net file servers (https://edcintl.cr.usgs.gov): v5 MODIS dekadal, v5 MODIS monthly, v6 VIIRS dekadal, v6 VIIRS monthly
- Direct API access permitted
- You are permitted to directly download data from USGS FEWS Net file servers
Drought and moisture conditions
Palmer Drought Severity Index (PDSI) from gridMET
- Description
- The Palmer Drought Severity Index (PDSI) is a measure of drought based on a soil moisture model applied to precipitation and temperature data. This particular data source is a gridded pentad (every 5 days) PDSI product produced from gridMET meteorological data and USDA STATSGO soil data. You can read more about this PDSI data product here.
- Approved data source
- University of Idaho Northwest Knowledge Network (NetCDF format—see "NetcdfSubset")
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
GRACE-based Soil Moisture and Groundwater Drought Indicators
Soil moisture and groundwater drought indicators derived from GRACE-FO satellite data. GRACE-FO is a satellite mission that maps the Earth's gravitational field, a measurement of spatial mass concentration. This data product incorporates GRACE-FO observations with other data and a numerical model. There are three indicators: surface soil moisture (top 2 cm of soil), root zone soil moisture (top 1 m of soil), and shallow groundwater. You can read more about the data here.
- Approved data source
- National Drought Mitigation Center (CONUS, NetCDF4 format)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Climate teleconnection indices
Teleconnection refers to climate patterns or anomalies in one region of the world that are correlated with and often influence weather patterns in distant parts of the globe.
Oceanic Niño Index (ONI)
- Description
- 3-month running average of sea surface temperature anomalies in the Niño 3.4 region. This measure is used as an indicator of the El Niño–Southern Oscillation phenomenon. Warm (El Niño) and cold (La Niña) phases are defined as a minimum of five consecutive ONI values surpassing a threshold of +/- 0.5°C. You can read more about the ONI here.
- Approved data source
- National Centers for Environmental Information (NCEI)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Niño Regions Sea Surface Temperatures
- Description
- Monthly sea surface temperature anomalies in Niño regions. These are additional measures related to the El Niño–Southern Oscillation phenomenon. You can read more about these measures here.
- Approved source
- National Centers for Environmental Information (NCEI)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Southern Oscillation Index (SOI)
- Description
- Standardized sea level pressure differences between Tahiti and Darwin, Australia. This measure is used as an indicator of the El Niño–Southern Oscillation phenomenon. The index is negative when there is below-normal air pressure at Tahiti and above-normal air pressure at Darwin, and vice versa when the index is positive. Periods of negative values coincide with El Niño and positive values coincide with La Niña. You can read more about the SOI here.
- Approved source
- National Centers for Environmental Information (NCEI)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Madden-Julian Oscillation (MJO) Pentad Indices
- Description
- The Madden-Julian Oscillation is an eastward moving weather pattern with a typical period of 30 to 60 days. The pentad indices are normalized projections of pentad velocity potential on patterns from extended empirical orthogonal function analysis on historical reference data from 1979 to 2000. You can read more about the Madden-Julian Oscillation from Climate.gov, and the methodology for the indices from the CPC.
- Approved source
- Climate Prediction Center (CPC)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Pacific North American (PNA) Index
- Description
- The Pacific North American (PNA) pattern is a large-scale weather in the atmospheric circulation over the Pacific Ocean and North America. The index is the projection of the air pressure field on a particular mode from empirical orthogonal function analysis of reference data from 1950 to 2000. You can read more about the PNA pattern from Climate.gov and from NCEI.
- Approved source
- Climate Prediction Center (CPC)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Pacific Decadal Oscillation (PDO) Index
- Description
- The Pacific Decadal Oscillation (PDO) is a climate pattern of the Pacific Ocean that is characterized by warm and cool phases in sea surface temperature. It is similar to the El Niño–Southern Oscillation but has a longer time scale, with phases that can persist for 20 to 30 years. The index is calculated from projecting sea surface temperatures on the first principal component of reference data from 1900 to 1993. You can read more about the PDO index from JISAO or from NCEI.
- Approved source
- National Centers for Environmental Information (NCEI)
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Sample data reading code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
Vegetation conditions
MODIS Vegetation Indices
- Description
- The Moderate Resolution Imaging Spectroradiometer (MODIS) is an instrument on the Terra and Aqua spacecraft. The Vegetation Indices 16-day data product includes global Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) measures of vegetation. For more information, see this product's catalog entry and example notebook from the Microsoft Planetary Computer.
- Approved source
- Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/modis-13A1-061)
- Hindcast data available
- Preceding October 1 through July 21 for each forecast year in Hindcast test set for STAC items that spatially intersect with the forecast site drainage basins
- Data download code
- Via runtime repository: drivendataorg/water-supply-forecast-rodeo-runtime
- Direct API access permitted
- For additional items, you are permitted to directly download via the Planetary Computer API
Land and Elevation
Copernicus DEM GLO-90
- Description
- The Copernicus Digital Elevation Model (DEM) is an elevation dataset that represents the surface of the Earth, including buildings, infrastructure, and vegetation. The data comes from the TanDEM-X mission. The GLO-90 data product has a horizontal resolution of approximately 90 meters. For more information, see this product's catalog entry and example notebook from the Microsoft Planetary Computer.
- Approved source
- Microsoft Planetary Computer (https://planetarycomputer.microsoft.com/api/stac/v1/collections/cop-dem-glo-90)
- Direct API access permitted
- You are permitted to directly download data via the Planetary Computer API
National Land Cover Database (NLCD) Urban Imperviousness
- Description
- The National Land Cover Database (NLCD) is a set of data products on land cover and land cover change for the contiguous United States published by USGS and MRLC. Urban imperviousness refers to surfaces which are water resistant. The 2021 NLCD urban imperviousness product is approved for use in the challenge. You can read more about this product here. Important: NLCD is an update-based data product with releases corresponding to the year of the source imagery: 2001, 2006, 2011, 2016, 2019, and 2021. NLCD releases take many years to prepare, and the actual release date is typically several years later than the source imagery year. For example, the 2011 product was not available until 2014-03-31. In order to reflect operational conditions when performing inference, you must use only the latest epoch available based on release date. For example, if making predictions for the 2013-01-01 issue date, the latest release available at that time was the 2006 product (released on 2011-02-16), and so the latest available epoch would have been 2006. A CSV file
nlcd_release_dates.csv
containing release dates for each version is available on the data download page. - Approved source
- MRLC (19.54 GB, ZIP archive)
- Hindcast data available
- The 2021 NLCD imperviousness ZIP archive (
NLCD_impervious_2021_release_all_files_20230630.zip
) and companionnlcd_release_dates.csv
are directly available in the mounted data drive
BasinATLAS Basin Attributes
- Description
- The BasinATLAS dataset, a part of the HydroSHEDS database, is a collection of hydrological, physiographic, climate, land cover, geological, and anthropogenic variables describing sub-basins globally. Version 10 of the BasinATLAS dataset is approved for use in the challenge. You can read more about BasinATLAS here and see the detailed catalog of variables here. Important: BasinATLAS variables are derived from a diverse set of source datasets each with different time coverage. For any BasinATLAS variables used, you should clearly document in your model report the source data provenance and justify why the variable does not leverage future data in an unrealistic way for operational use or leak information about test set.
- Approved source
- HydroSHEDS via Figshare (2.7 GB, geodatabase format)
- Hindcast data available
- BasinATLAS v10 compressed geodatabase (
BasinATLAS_Data_v10.gdb.zip
) is directly available in the mounted data drive
Don't see a data source you want to use? Please see the documentation on requesting approval for additional data.