Water Supply Forecast Rodeo: Final Prize Stage

Water managers in the Western U.S. rely on accurate water supply forecasts to better operate facilities and mitigate drought. Help the Bureau of Reclamation improve seasonal water supply estimates in this probabilistic forecasting challenge! [Final Prize Stage] #climate

$400,000 in prizes
Completed jul 2024
34 joined

Problem description

The aim of the Water Supply Forecast Rodeo challenge is to create a probabilistic forecast model for the 0.10, 0.50, and 0.90 quantiles of seasonal water supply at 26 different hydrologic sites. Accurate forecasts with well-characterized uncertainty will help water resources managers better operate facilities and manage drought conditions in the American West.

In this Final Prize Stage, you will submit cross-validation results and a final model report. Complete details on the problem setup, data, submissions and evaluation are provided below.

Forecasting task

In this challenge, the target variable is cumulative streamflow volume, measured in thousand acre-feet (KAF), at selected locations in the Western United States. Streamflow is the volume of water flowing in a river at a specific point, and seasonal water supply forecasting is the practice of predicting the cumulative streamflow—the total volume of water—over a specified period of time (the "season" or "forecast period"). The forecast season of interest typically covers the spring and summer. For most of the sites used in this challenge, the forecast season is April through July.

The date that the forecast is made is called the issue date. For example, a forecast for the April–July season issued on March 1 for a given site would use the best available data as of March 1 to predict the cumulative volume that flows through that site later that year from April through July. In this challenge, you will issue forecasts on the 1st, 8th, 15th, and 22nd of each month from January through July.

Note that some issue dates will overlap with the forecast season. A portion of the seasonal water supply for some of these issue dates will be known, because the naturalized flow for full months that have passed before the issue date will be available (see "Antecedent monthly naturalized flow" section below). However, those forecasts will still have an unknown portion that is the residual (remaining) naturalized flow. For these forecasts, the target variable is still the total seasonal water supply and not just the residual. If you are predicting just the residual, you should add the known months' naturalized flow in order to generate the seasonal total. All forecasts for a given year and a given site share a single ground truth value.

Here are some examples of operational water supply forecasts:

The streamflow measurements of interest are of natural flow (also called unimpaired flow). This refers to the amount of water that would have flowed without influence from upstream dams or diversions. Observed flow measurements are naturalized by adding or subtracting known upstream influences.

Quantile forecasting

Rather than predicting a single value for each forecast, your task is to produce a quantile forecast: the 0.10, 0.50 (median), and 0.90 quantiles of cumulative streamflow volume. Forecasting quantiles acknowledges the uncertainty inherent in predictions and provides a range representing a distribution of possible outcomes . The 0.50 quantile (median) prediction is the central tendency of the distribution, while the 0.10 and 0.90 quantile predictions are the lower and upper bounds of a centered 80%-confidence prediction interval.

Note that quantiles have the opposite notation compared with the "probability of exceedance" used in many of the operational water supply forecasts. For example, a 90% probability of exceedance is equivalent to the 0.10 quantile. Measurements should fall below the true 0.10 quantile value 10% of the time, and they should exceed it 90% of the time. For more information on probability of exceedance, see the section "Interpreting water supply forecasts" of this reference.

Labels (ground truth data)

The ground truth data are derived from naturalized streamflow data provided by the Natural Resources Conservation Service (NRCS) and the National Weather Service River Forecast Centers (RFCs) and are available in CSV format on the data download page. Each row will represent the seasonal water supply for one year's season at one site.

In the Final Prize Stage, the cross_validation_labels.csv file contains the seasonal water supply values for the 20-year cross-validation evaluation period from 2004 through 2023 for the 26 sites used in the challenge. The file contains the following columns:

  • site_id (str) — identifier for a particular site.
  • year (int) — the year whose season the seasonal water supply measurement corresponds to.
  • volume (float) — seasonal water supply volume in thousand acre-feet (KAF).

An additional file prior_historical_labels.csv contains the seasonal water supply values for water years before 2004, in case you want to use them as additional training data.

Metadata

An additional CSV file metadata.csv file provides important metadata about the 26 sites. The metadata file contains the following columns:

  • site_id (str) — identifier used in this challenge for a particular site.
  • season_start_month (int) — month that is the beginning of the forecast season, inclusive.
  • season_end_month (int) — month that is the end of the forecast season, inclusive.
  • elevation (float) — elevation of site in feet.
  • latitude (float) — latitude of site in degrees north.
  • longitude (float) — longitude of site in degrees east.
  • drainage_area (float) — estimated area of the drainage basin from USGS in square miles. Note that not all sites have a drainage area estimate. See basin polygons in the geospatial data as an alternative way to estimate the drainage area.
  • usgs_id (str) — U.S. Geological Service (USGS) identifier for this monitoring location in the USGS streamgaging network. This is an 8-digit identifier. May be missing if there is no USGS streamgage at this location.
  • usgs_name (str) — USGS name for this monitoring location in the USGS streamgaging network. May be missing if there is no USGS streamgage at this location.
  • nrcs_id (str) — NRCS triplet-format identifier for this monitoring location. May be missing if the NRCS does not track station data for this location.
  • nrcs_name (str) — NRCS name for this site's location. May be missing if the NRCS does not track station data for this location.
  • rfc_id (str) — RFC identifier for this site's location. May be missing if no RFC issues a forecast for this location.
  • rfc_name (str) — RFC name for this site's location. May be missing if no RFC issues a forecast for this location.
  • rfc (str) — the specific RFC whose area of responsibility this site is located in. May be missing if no RFC issues a forecast for this location.

Update November 2, 2023: The usgs_id column has been corrected from an integer type to a string type. Correct USGS identifiers are 8 digits and may have leading zeros. To correctly read this column using pandas.read_csv, use the keyword argument dtype={"usgs_id": "string"}.

Basin geospatial data

Geospatial vector data related to each of the 26 sites is available in a multi-layered GeoPackage file. You can read this data using libraries like fiona or geopandas. This data file contains two layers named basins and sites.

The basins layer contains one vector feature per site that delineates each site's drainage basin. A drainage basin refers to the area of land where all flowing surface water converges to a single point. A drainage basin is determined by topography and the fact that water flows downhill. Features in this layer contain the following properties:

  • site_id (str) — identifier used in this challenge for a particular site.
  • name (str) — nicely formatted human-readable name for the site.
  • area (float) — area of the basin polygon in square miles.

The sites that contains one vector feature per site that is the point location of the site. Features in this layer contain the following properties.

  • site_id (str) — identifier used in this challenge for a particular site.
  • name (str) — nicely formatted human-readable name for the site.


Map showing site location and drainage basin for the 26 sites in the challenge.
Map showing a plot of the provided geospatial data. Sites are shown as blue points, and drainage basin polygons are shown in purple.

Features (predictor data)

Seasonal water supply is influenced by a range of hydrological, weather, and climate factors. Relevant data sources for streamflow volume prediction include antecedent streamflow measurements, snowpack estimates, short and long-term meteorological forecasts, and climate teleconnection indices. You are encouraged to experiment with and use a variety of relevant data sources in your modeling. However, only approved data sources are permitted to be used as input into models for valid submissions. See the Approved Data Sources page for an up-to-date list of approved data sources.

For the Final Prize Stage, you will be responsible for downloading your own feature data from approved sources for the cross-validation. Example code for downloading and reading the approved data sources is available in the challenge data and runtime repository. You should clearly document your data download process, and track any code you use. Winning solutions will need to be reproduced as a condition for being awarded prizes.

Antecedent monthly naturalized flow

The past monthly time series observations of the forecast target variable are available to be used as autoregressive modeling features. A dataset of antecedent monthly naturalized flow is available on the data download page. Please note that only 23 of the 26 sites have monthly data available.

For the Final Prize Stage, there are two files provided:

  • cross_validation_monthly_flow.csv—this file contains the monthly naturalized flow time series for the 20-year hindcast period.
  • prior_historical_monthly_flow.csv—this file contains the monthly naturalized flow time series for years before 2004.

For each forecast year, the dataset includes values from October of the previous year through the month before the end of the forecast season (i.e., June for sites with seasons through July, and May for the site with a season through June). This time range provided follows the standard concept of a "water year" in hydrology.

The data includes the following columns:

  • site_id (str) — identifier for a particular site.
  • forecast_year (int) - the year whose forecast season this row is relevant for.
  • year (int) — the year whose month the total streamflow value is for.
  • month (int) — the month that the total streamflow value is for.
  • volume (float) — total monthly streamflow volume in thousand acre-feet (KAF). May be missing for some months for some sites.

For example, the row with year=2004 and month=11 contains the total streamflow volume for the month of November 2004. That row has forecast_year=2005 because it is an observation that is relevant to forecasting the seasonal water supply for the 2005 water year.

Supplemental NRCS naturalized flow

Supplemental training data from 592 other NRCS monitoring sites that are not the 26 sites in the challenge has been uploaded to the data download page. You may use this data as additional training data for your model. There are two files provided:

  • cross_validation_supplementary_nrcs_monthly_flow.csv—this file contains the monthly naturalized flow time series for the various sites from October through July for the 20-year hindcast period
  • prior_historical_supplementary_nrcs_monthly_flow.csv—this file contains the monthly naturalized flow time series for the various sites from October through July for years before 2004.
  • supplementary_nrcs_metadata.csv—this file contains metadata about the supplemental sites.

You will be able to join between the two files using the nrcs_id identifier column.

Hydrologic Unit Codes (HUC)

Added January 9, 2024.

When processing certain feature data sources, such as the spatially averaged data from UA/SWANN, you may need to associate the forecast sites or drainage basins in the challenge to USGS Hydrologic Unit Codes (HUC). HUC definitions are a type of static metadata that are not considered to be a feature, and you do not need to use a specific approved source, though we ask that you use one that is reasonably official. Please make sure that your process for selecting or associating HUCs to forecast sites is clearly documented and reproducible in your model report and training code.

Some sources for HUC definitions that you may find useful:

If you have any questions, please ask in this forum thread.

Updates to your model from previous stages

You are allowed to make updates to the model that you previously submitted to previous stages. This includes changes to your modeling approach, such as changes to the model architecture, hyperparameters, or features.

Submissions

In the Final Prize Stage, you are required to make two types of submissions. See the linked pages below for detailed submission requirements.

  1. Cross-validation predictions submission format
  2. Final model report submission format

Overall evaluation

Evaluation criteria

Forecast Skill (Hindcast cross-validation) (30%)
Solutions will be evaluated based on cross-validation results over the 20-year hindcast period.
Forecast Skill (Forecast) (10%)
Solutions will be evaluated based on their predictions' quantile score from the Forecast Stage evaluation.
Rigor (20%)
To what extent is the solution methodology based on a sound physical and/or statistical foundation? Judges will consider how methodological decisions support or limit different aspects of rigor, such as avoiding overfitting, avoiding data leakage, assessing and mitigating biases, and potential for the model to produce valid predictions in an applied context. Judges will also consider whether any aspects of the methodology are physically implausible.
Innovation (10%)
To what extent does the solution use datasets or modeling techniques that advance the state-of-the-art in water supply forecasting? Judges will consider innovation in any aspect of the technical approach, including but not limited to the data sources used, feature engineering, algorithm and architecture selection, or the approach to training and evaluation.
Generalizability (10%)
How well does the solution generalize to the varied sites and conditions tested in the challenge? Judges will consider reported information and hypotheses about the model’s performance under different geographic, environmental, and temporal conditions.
Efficiency & Scalability (10%)
How computationally efficient is the solution, and how well could it scale to an increased number of sites? Judges will consider all aspects of efficiency such as the reported total test runtime (including data processing), training resource costs (e.g., hardware, memory usage), and any reported potential for efficiency improvements or optimizations.
Clarity (10%)
How clearly are model mechanics exposed, communicated, and visualized in the report? Judges will consider how well organized and presented the report is.

Forecast skill metric

The "Forecast Skill" categories in the evaluation criteria will be evaluated based on the quantile loss score of your predictions.

Primary metric: Quantile loss

Quantile loss, also known as pinball loss, assesses the accuracy of a quantile forecast by quantifying the disparity between predicted quantiles (percentiles) and the actual values. This is an error metric, so a lower value is better. Mean quantile loss is the quantile loss averaged over all observations for a given quantile and is implemented in scikit-learn. We are using a quantile loss that has been multiplied by a factor of 2 so that the 0.5 quantile loss is equivalent to mean absolute error.

$$ \text{Mean Quantile Loss}(\tau, y, \hat{y}) = \frac{2}{n} \sum_{i=1}^{n} \tau \cdot \max(y_i - \hat{y}_i, 0) + (1 - \tau) \cdot \max(\hat{y}_i - y_i, 0) $$

Where:

  • |$\tau$| represents the desired quantile (e.g., 0.1 for the 10th percentile).
  • |$y_i$| represents the actual observed value for the |$i$|th observation.
  • |$\hat{y}_i$| represents the predicted value for the |$\tau$| quantile of the |$i$|th observation.
  • |$n$| represents the total number of observations.

For this challenge, mean quantile loss will be calculated for each |$\tau \in \{0.10, 0.50, 0.90\}$| and averaged. The final score will be calculated as:

$$ \text{Averaged Mean-Quantile-Loss (MQL)} = \frac{\text{MQL}_{0.10} + \text{MQL}_{0.50} + \text{MQL}_{0.90}}{3} $$

Secondary metric: Interval coverage

Interval coverage is the proportion of true values that fall in the predicted interval range (0.10 and 0.90). This metric provides information about the statistical calibration of predicted intervals. It will not directly be used for ranking; well-calibrated 0.10 and 0.90 quantile forecasts should have a coverage value of 0.8.

$$ \text{Interval Coverage} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{(\hat{y}_{0.1,i} \leq y_i \leq \hat{y}_{0.9,i})} $$

Where:

  • |$\mathbf{1}$| is an indicator function
  • |$y_i$| represents the actual observed value for the |$i$|th observation.
  • |$\hat{y}_{\tau, i}$| represents the predicted value for the |$\tau$| quantile of the |$i$|th observation.
  • |$n$| represents the total number of observations.

Subcategory bonus prizes

Subcategory bonus prizes are awarded based on performance on particular subsets of cross-validation hindcast dataset. The subcategories are defined as follows:

Regional Bonus Prize: Cascades

Sites located in the Cascades mountain range in the Pacific Northwest. This includes the following sites:

  • skagit_ross_reservoir
  • stehekin_r_at_stehekin
  • green_r_bl_howard_a_hanson_dam
  • detroit_lake_inflow

Regional Bonus Prize: Sierra Nevada

Sites located in the Sierra Nevada mountain range in California. This includes the following sites:

  • san_joaquin_river_millerton_reservoir
  • merced_river_yosemite_at_pohono_bridge
  • american_river_folsom_lake

Regional Bonus Prize: Colorado Headwaters

Sites in smaller high-elevation sites in the Rocky Mountains range in Colorado. This includes the following sites:

  • ruedi_reservoir_inflow
  • dillon_reservoir_inflow
  • taylor_park_reservoir_inflow
  • animas_r_at_durango

Challenging Basins Bonus Prize

Sites in basins with relatively low water supply volume relative to basin area, relatively high variability in water supply volume, and relatively lower influence of snowmelt on water supply.

  • owyhee_r_bl_owyhee_dam
  • virgin_r_at_virtin
  • pecos_r_nr_pecos

Early Lead Time Bonus Prize

Issue dates from January 1 through March 15.

Bonus prize evaluation criteria

Forecast Skill (Hindcast cross-validation) (70%)
Solutions will be evaluated based on cross-validation results over the 20-year hindcast period, subset to the particular observations associated with that bonus prize.
Rigor (30%)
To what extent is the solution methodology based on a sound physical and/or statistical foundation? Judges will consider how methodological decisions support or limit different aspects of rigor, such as avoiding overfitting, avoiding data leakage, assessing and mitigating biases, and potential for the model to produce valid predictions in an applied context. Judges will also consider whether any aspects of the methodology are physically implausible.

Good luck

Good luck and felicitous forecasting! If you have any questions, you can always visit the competition forum!