Water Supply Forecast Rodeo: Development Arena

START HERE! Water managers in the Western U.S. rely on accurate water supply forecasts to better operate facilities and mitigate drought. Help the Bureau of Reclamation improve seasonal water supply estimates in this probabilistic forecasting challenge! #climate

Development Arena
dec 2023
546 joined

Problem description

The aim of the Water Supply Forecast Rodeo challenge is to create a probabilistic forecast model for the 0.10, 0.50, and 0.90 quantiles of seasonal water supply at 26 different hydrologic sites. In this first challenge stage, the Hindcast Stage, you will use historical data to simulate forecasts in the past. Accurate forecasts with well-characterized uncertainty will help water resources managers better operate facilities and manage drought conditions in the American West.

Forecasting Task

In this challenge, the target variable is cumulative streamflow volume, measured in thousand acre-feet (KAF), at selected locations in the Western United States. Streamflow is the volume of water flowing in a river at a specific point, and seasonal water supply forecasting is the practice of predicting the cumulative streamflow—the total volume of water—over a specified period of time (the "season" or "forecast period"). The forecast season of interest typically covers the spring and summer. For most of the sites used in this challenge, the forecast season is April through July.

The date that the forecast is made is called the issue date. For example, a forecast for the April–July season issued on March 1 for a given site would use the best available data as of March 1 to predict the cumulative volume that flows through that site later that year from April through July. In this challenge, you will issue forecasts on the 1st, 8th, 15th, and 22nd of each month from January through July.

Note that some issue dates will overlap with the forecast season. For these forecasts, the target variable is still the overall seasonal water supply. This means that all forecasts for a given year and a given site share a single ground truth value. However, because a portion of the seasonal water supply may be known (for full months that have passed before the issue date, see "Antecedent monthly naturalized flow" section below), those forecasts will have an unknown portion that is only the residual (remaining) naturalized flow.

Here are some examples of operational water supply forecasts:

The streamflow measurements of interest are of natural flow (also called unimpaired flow). This refers to the amount of water that would have flowed without influence from upstream dams or diversions. Observed flow measurements are naturalized by adding or subtracting known upstream influences.

Quantile Forecasting

Rather than predicting a single value for each forecast, your task is to produce a quantile forecast: the 0.10, 0.50 (median), and 0.90 quantiles of cumulative streamflow volume. Forecasting quantiles acknowledges the uncertainty inherent in predictions and provides a range representing a distribution of possible outcomes . The 0.50 quantile (median) prediction is the central tendency of the distribution, while the 0.10 and 0.90 quantile predictions are the lower and upper bounds of a centered 80%-confidence prediction interval.

Note that quantiles have the opposite notation compared with the "probability of exceedance" used in many of the operational water supply forecasts. For example, a 90% probability of exceedance is equivalent to the 0.10 quantile. Measurements should fall below the true 0.10 quantile value 10% of the time, and they should exceed it 90% of the time. For more information on probability of exceedance, see the section "Interpreting water supply forecasts" of this reference.

Labels (ground truth data)

The ground truth data are derived from naturalized streamflow data provided by the Natural Resources Conservation Service (NRCS) and the National Weather Service River Forecast Centers (RFCs) and are available in CSV format on the data download page. Each row will represent the the seasonal water supply for one year's season at one site. The file contains the full history available for each of the 26 sites used in the challenge, with certain years removed for testing. For the Hindcast Stage, we will be using hold-out validation with 10 years of data in the test set (odd years from 2005–2023). Training on years in the test set is prohibited. Teams that are found to have trained submitted models on test years will be disqualified. Please use the provided training labels to ensure that your models are trained on appropriate years. Hindcast test years will be released after the conclusion of the Hindcast Stage to be available for training for models to be used in the Forecast Stage. For more detailed guidance on avoiding leakage from test years with feature/predictor data, see the "Only data within the same water year" section below.

The train.csv file contains the following columns:

  • site_id (str) — identifier for a particular site.
  • year (int) — the year whose season the seasonal water supply measurement corresponds to.
  • volume (float) — seasonal water supply volume in thousand acre-feet (KAF).

Please note that the ground truth data includes some provisional or estimated data, especially measurements from 2023. This data may be subject to minor revisions. In order to evaluate on the as accurate ground truth measurements as available, the Hindcast Stage final evaluation test data will be the data published by NRCS and the RFCs as of December 5, 2023. In general, data used in the challenge may not exactly match all final values when they are eventually approved by their respective agencies.

Update November 29, 2023: train.csv has been updated with the November 28, 2023 values. Please see the data download page for the updated file.

Metadata

An additional CSV file metadata.csv file provides important metadata about the 26 sites. The metadata file contains the following columns:

  • site_id (str) — identifier used in this challenge for a particular site.
  • season_start_month (int) — month that is the beginning of the forecast season, inclusive.
  • season_end_month (int) — month that is the end of the forecast season, inclusive.
  • elevation (float) — elevation of site in feet.
  • latitude (float) — latitude of site in degrees north.
  • longitude (float) — longitude of site in degrees east.
  • drainage_area (float) — estimated area of the drainage basin from USGS in square miles. Note that not all sites have a drainage area estimate. See basin polygons in the geospatial data as an alternative way to estimate the drainage area.
  • usgs_id (str) — U.S. Geological Service (USGS) identifier for this monitoring location in the USGS streamgaging network. This is an 8-digit identifier. May be missing if there is no USGS streamgage at this location.
  • usgs_name (str) — USGS name for this monitoring location in the USGS streamgaging network. May be missing if there is no USGS streamgage at this location.
  • nrcs_id (str) — NRCS triplet-format identifier for this monitoring location. May be missing if the NRCS does not track station data for this location.
  • nrcs_name (str) — NRCS name for this site's location. May be missing if the NRCS does not track station data for this location.
  • rfc_id (str) — RFC identifier for this site's location. May be missing if no RFC issues a forecast for this location.
  • rfc_name (str) — RFC name for this site's location. May be missing if no RFC issues a forecast for this location.
  • rfc (str) — the specific RFC whose area of responsibility that this site is located in. May be missing if no RFC issues a forecast for this location.

Update November 2, 2023: The usgs_id column has been corrected from an integer type to a string type. Correct USGS identifiers are 8 digits and may have leading zeros. To correctly read this column using pandas.read_csv, use the keyword argument dtype={"usgs_id": "string"}.

Basin geospatial data

Geospatial vector data related to each of the 26 sites is available in a multi-layered GeoPackage file. You can read this data using libraries like fiona or geopandas. This data file contains two layers:

A layer named basins that contains one vector feature per site that delineates each site's drainage basin. A drainage basin refers to the area of land where all flowing surface water converges to a single point. A drainage basin is determined by topography and the fact that water flows downhill. Features in this layer contain the following properties:

  • site_id (str) — identifier used in this challenge for a particular site.
  • name (str) — nicely formatted human-readable name for the site.
  • area (float) — area of the basin polygon in square miles.

A layer named sites that contains one vector feature per site that is the point location of the site. Features in this layer contain the following properties.

  • site_id (str) — identifier used in this challenge for a particular site.
  • name (str) — nicely formatted human-readable name for the site.


Map showing site location and drainage basin for the 26 sites in the challenge.
Map showing a plot of the provided geospatial data. Sites are shown as blue points, and drainage basin polygons are shown in purple.

Features (predictor data)

Seasonal water supply is influenced by a range of hydrological, weather, and climate factors. Relevant data sources for streamflow volume prediction include antecedent streamflow measurements, snowpack estimates, short and long-term meteorological forecasts, and climate teleconnection indices. You are encouraged to experiment with and use a variety of relevant data sources in your modeling. However, only approved data sources are permitted to be used as input into models for valid submissions. See the Approved Data Sources page for an up-to-date list of approved data sources.

If you would like to use additional data, please refer to the later section on requesting approval for additional data for more details.

You will be responsible for downloading your own feature data from approved sources for model training. Example code for downloading and reading the approved data sources is available in the challenge data and runtime repository.

For code execution during the Hindcast Stage Evaluation Arena and during the Forecast Stage, you will be required to use data hosted by DrivenData that is mounted to the code execution runtime. In the case of any exceptions, these will be documented on a per-data-source basis. Format and access for each data source will be documented, and the provided sample code will allow you to download the data in the same way.

Antecedent monthly naturalized flow

The past monthly time series observations of the forecast target variable are available to be used as autoregressive modeling features. A dataset of antecedent monthly naturalized flow is available on the data download page. This data has been split into two files, train_monthly_naturalized_flow.csv and test_monthly_naturalized_flow.csv, corresponding to the train and test forecast years, respectively. Please note that only 23 of the 26 sites have monthly data available.

For each forecast year, the dataset includes values from October of the previous year through the month before the end of the forecast season (i.e., June for sites with seasons through July, and May for the site with a season through June). This time range provided follows the standard concept of a "water year" in hydrology.

The data includes the following columns:

  • site_id (str) — identifier for a particular site.
  • forecast_year (int) - the year whose forecast season this row is relevant for.
  • year (int) — the year whose month the total streamflow value is for.
  • month (int) — the month that the total streamflow value is for.
  • volume (float) — total monthly streamflow volume in thousand acre-feet (KAF). May be missing for some months for some sites.

For example, the row with year=2004 and month=11 contains the total streamflow volume for the month of November 2004. That row has forecast_year=2005 because it is an observation that is relevant to forecasting the seasonal water supply for the 2005 season.

Please note that the naturalized flow data includes some provisional or estimated data, especially measurements from 2023. This data may be subject to minor revisions. As with the ground truth label data, we will update and finalize the time series datasets with the values published by NRCS and the RFCs as of November 28, 2023. In general, data used in the challenge may not exactly match all final values when they are eventually approved by their respective agencies.

Update November 29, 2023: train_monthly_naturalized_flow.csv and test_monthly_naturalized_flow.csv have been updated with the November 28, 2023 values. Please see the data download page for the updated files.

Time and data use

The task in this challenge is not a standard time series modeling task. There are two levels at which to consider time in this challenge, each with its own requirement:

  1. Across water years—water years should be treated independently.
  2. Within water years—no future data should be used with respect to the issue date.

In this challenge, we use the concept of a "water year" to identify years. A water year in hydrology is defined as a 12-month period from October 1 to September 30 identified by the year in which it ends. For example, the 2005 water year begins on October 1, 2004 and ends on September 30, 2005. So, if you are issuing a forecast on 2005-03-15 for the seasonal water supply for 2005, the forecast and the seasonal water supply value are associated with the 2005 water year.

Please review the detailed explanations in the following two sections carefully.

Independent water years

At the level of years, this challenge is treating water years as statistically independent observations rather than as a temporal sequence. You should think about this as more like standard regression modeling and not as time series modeling.

Treating years as statistically independent observations is a common evaluation methodology in seasonal water supply forecasting. For the competition, it allows more data to be available for training, especially for more recent years and for data sources that have a limited period of record. Hydrologically, this is justified because water supply is primarily driven by near-term physical processes such as precipitation and snowmelt.

Some consequences of the independent water years:

  • It is valid to predict on a test year with a model whose training data includes water years in the future. For example, you can train a model on water years 2006, 2008, 2010, etc. and use it to predict on the test water year of 2005.
  • When calculating features, lookback windows should not go further than October 1 of that water year. For example, if predicting for an issue date of 2015-03-15 for the 2015 water year, your lookback window should not use data from before 2014-10-01. Data from before 2014-10-01 belong to the 2014 water year.
  • You should be careful not to treat years as ordered or to use year as a variable. Your model should not make use of the temporal relationship between years.

You are responsible for explicitly parameterizing your code to subset feature data to the appropriate time period. Any sample data reading code provided in the runtime repository that is available for your optional use will load data that is correctly subsetted by time for a given issue date and will be clearly documented. Winning submissions will be reviewed to ensure they do not violate these constraints and may be disqualified if they are not compliant.

Exceptions to lookback window water year limit

Updated December 13, 2023

Challenge organizers are allowing exceptions detailed in this section to lookback windows being limited to October 1. If you make use of such an exception, you must in your model report justify its use and demonstrate that there is not leakage if the lookback window for a training year overlaps with test years. Reports that do not adequately do so will be marked down on "Rigor" evaluation criterion. Be sure to address the following:

  • Explain why this feature is important to water supply forecasting
  • Explain based on hydrological and physical reasoning why this feature does not result in leakage
  • Demonstrate empirically that inclusion of this feature does not cause overfitting, e.g., cross-validation results where the training data does not overlap with validation data

Exceptions are allowed for the following feature data categories. These categories correspond to headings on the data approval sources page and apply only to data sources within that category.

  • Snowpack
  • Weather and climate products
  • Climate teleconnection indices

These exceptions are being allowed so that solutions may include features that model long-timescale processes that affect seasonal water supply.

No future data within water years

When making a prediction for a specific issue date, your model must not use any future data from the same water year as features. For this challenge, this means that a forecast may only use feature data from before the issue data. For example, if you are issuing a forecast for 2021-03-15, you may only use feature data from 2021-03-14 or earlier.

You are responsible for explicitly parameterizing your code to subset feature data to the appropriate time period. Sample data reading code is provided in the runtime repository. This code is clearly documented and available for your optional use; it will load data that is correctly subsetted by time for a given issue date. Winning submissions will be reviewed to ensure they do not violate these constraints and may be disqualified if they are not compliant.

Supplemental NRCS naturalized flow

Updated December 12, 2023

Supplemental training data from 592 other NRCS monitoring sites that are not the 26 sites in the challenge has been uploaded to the data download page. You may use this data as additional training data for your model. There are two files provided:

  • supplementary_nrcs_train_monthly_naturalized_flow.csv—this file contains the monthly naturalized flow time series for the various sites from October through July, excluding test set water years.
  • supplementary_nrcs_metadata.csv—this file contains metadata about the supplemental sites.

You will be able to join between the two files using the nrcs_id identifier column.

Hydrologic Unit Codes (HUC)

Added January 9, 2024.

When processing certain feature data sources, such as the spatially averaged data from UA/SWANN, you may need to associate the forecast sites or drainage basins in the challenge to USGS Hydrologic Unit Codes (HUC). HUC definitions are a type of static metadata that are not considered to be a feature, and you do not need to use a specific approved source, though we ask that you use one that is reasonably official. Please make sure that your process for selecting or associating HUCs to forecast sites is clearly documented and reproducible in your model report and training code.

Some sources for HUC definitions that you may find useful:

If you have any questions, please ask in this forum thread.

Request a data source

Only approved data sources are allowed for generating predictions during hindcasting and forecasting..

One of the goals of the challenge is to solicit innovative leveraging of new data sources to improve the skill of forecast models. You may be interested in incorporating additional supplementary data sources like earth observation and remote sensing data, climate forecasts, and other basin condition estimates. These sources can capture valuable information on seasonal runoff and changing land cover conditions influenced by factors such as prolonged drought, heat stress, pests (e.g., mountain pine beetle), wildfires, land conversion, development, restoration, and climate change impacts.

If you would like to use data sources beyond the approved list, you are welcome to submit an official approval request form and the challenge organizers will review the request. Only select sources that demonstrate a strong case for use will be considered.

To qualify for approval, data sources must meet the following minimum requirements:

  • Is freely and publicly available to all participants.
  • Produced reliably by an operational data product.
  • Provides clear value beyond existing approved sources.
  • Is technically feasible (e.g., reasonable data volume and processing needed) to integrate into the challenge's code execution runtime.
  • Would be technically feasible (e.g., reasonable data volume and processing needed) for use in an operational forecast.

Any requests to add approved data sources must be received by December 5, 2023 at 23:59:59 UTC to be considered. An announcement will be made to all challenge participants if your data source has been approved for use, and it will be added to the documentation on the Approved Data Sources page.

Performance metric

Primary metric: Quantile loss

Quantile loss, also known as pinball loss, assesses the accuracy of a quantile forecast by quantifying the disparity between predicted quantiles (percentiles) and the actual values. This is an error metric, so a lower value is better. Mean quantile loss is the quantile loss averaged over all observations for a given quantile and is implemented in scikit-learn. We are using a quantile loss that has been multiplied by a factor of 2 so that the 0.5 quantile loss is equivalent to mean absolute error.

$$ \text{Mean Quantile Loss}(\tau, y, \hat{y}) = \frac{2}{n} \sum_{i=1}^{n} \tau \cdot \max(y_i - \hat{y}_i, 0) + (1 - \tau) \cdot \max(\hat{y}_i - y_i, 0) $$

Where:

  • |$\tau$| represents the desired quantile (e.g., 0.1 for the 10th percentile).
  • |$y_i$| represents the actual observed value for the |$i$|th observation.
  • |$\hat{y}_i$| represents the predicted value for the |$\tau$| quantile of the |$i$|th observation.
  • |$n$| represents the total number of observations.

For this challenge, mean quantile loss will be calculated for each |$\tau \in \{0.10, 0.50, 0.90\}$| and averaged. The final score will be calculated as:

$$ \text{Averaged Mean-Quantile-Loss (MQL)} = \frac{\text{MQL}_{0.10} + \text{MQL}_{0.50} + \text{MQL}_{0.90}}{3} $$

Secondary metric: Interval coverage

Interval coverage is the proportion of true vallues that fall in the predicted interval range (0.10 and 0.90). This metric provides information about the statistical calibration of predicted intervals. It should not directly be used for ranking; well-calibrated 0.10 and 0.90 quantile forecasts should have a coverage value of 0.8.

$$ \text{Interval Coverage} = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{(\hat{y}_{0.1,i} \leq y_i \leq \hat{y}_{0.9,i})} $$

Where:

  • |$\mathbf{1}$| is an indicator function
  • |$y_i$| represents the actual observed value for the |$i$|th observation.
  • |$\hat{y}_{\tau, i}$| represents the predicted value for the |$\tau$| quantile of the |$i$|th observation.
  • |$n$| represents the total number of observations.

Submission format

In the Development Arena, the format for the submission is a CSV file where each row is a forecast for a single site (site_id) on a single issue date (issue_date). Each row will have three prediction columns volume_10, volume_50, and volume_90 for the respective quantile predictions. Predictions for water supply volume must be floating point number values. For example, 1.0 is a valid float, but 1 is not.

Example

For example, if you predicted...
site_id issue_date volume_10 volume_50 volume_90
hungry_horse_reservoir_inflow 2005-01-01 0.0 0.0 0.0
hungry_horse_reservoir_inflow 2005-01-08 0.0 0.0 0.0
... ... ... ... ...

The first few rows of the .csv file that you submit would look like:

site_id,issue_date,volume_10,volume_50,volume_90
hungry_horse_reservoir_inflow,2005-01-01,0.0,0.0,0.0
hungry_horse_reservoir_inflow,2005-01-08,0.0,0.0,0.0
...

Looking forward to the Evaluation Arena

The Development Arena is intended for you to get started on development of your model. Submissions to the Development Arena are to help provide feedback on your performance and will not be considered for prizes. Instead, prizes for the Hindcast Stage will be awarded based on submission to the Evaluation Arena (opening late November) where you will submit code and inference on the test set will be evaluated in a remote code execution environment. Keep that in mind as you develop your solution!

Solutions are required to be implemented in either Python or R. Please note that challenge organizers will only be providing sample code in Python and not in R, and that solutions implemented in R may be required to call their R code through a Python wrapper. The version of Python in the runtime environment will be 3.10.

Your code will need to run inside a Docker container with dependencies pre-installed. There will be no internet access available from inside the container except to specific APIs for approved feature data sources. We will have a process where you can request the dependencies you need. Dependencies will need to be installable via the conda package manager. The runtime container will have access to a GPU. The runtime hardware and environment specification and the procedure to request additional dependencies will be provided in the coming weeks—please keep an eye out.

Frequently Asked Questions (FAQ)

What does it mean to treat water years as "independent"?

Independence here refers to independence in statistics and probability. It means that knowledge of the values for one observation (water year) does not give information about other observations. This is a modeling choice and an approximation of reality that balances tradeoffs towards the goals of water supply modeling. ​​Hydrologically, this is justified because water supply is primarily driven by near-term physical processes such as precipitation and snowmelt.

What is the difference between training on other water years vs. inference that is independent of other water years?

When you train a model in machine learning, you are fitting parameters. The way these parameters are fit should not depend on the data you're later going to perform inference on. Then, once your model parameters are fixed, performing inference should use variable values for one water year at a time. When performing inference, you should not depend on any temporal relationship between the inference year and any other water year.

  • Is independent: Your feature depends on the aggregated value of a variable across training years, and this feature is calculated in the same way no matter the year in which you are performing inference. For example, your feature is the difference between the current value of a variable and the maximum value in the training dataset.
  • Not independent: Your feature depends on the difference between the current value of a variable and the value from the previous water year. This is using the relationship between the current year and another year.

A few potentially helpful concepts or rules of thumb:

  • Year should not be a variable in your modeling. You should treat year as if it's an identifier.
  • Pretend that year values have been replaced with random hashes (e.g., f9e0da, 04b390, 65cf98) and that they have been randomly shuffled. Does your model still work without any knowledge of the ordering of years?

Some years in the training set are in the future of years in the test set (e.g., 2010 is a training year and 2009 is a test year). Am I allowed to train a model on data from 2010 and use it to predict for 2009?

Yes. Because we are treating water years as independent and not temporally sequenced, there is no relationship between 2010 and 2009. You should pretend that 2009 and 2010 are just identifier values, and that their data are independent realizations of random variables.

However, this also means that when you are making predictions for 2009, you shouldn't explicitly use any information from 2010 with the knowledge that it is the year after 2009.

When I make a prediction for a given issue date, what is the allowable time range of data to use?

An individual forecast on a given issue date may in general use feature variables derived from data that:

  • Starts at the beginning of that water year, i.e., October 1 of the previous calendar year
  • Ends on the day before the issue date

For example, a forecast for issue date 2017-03-15 may use data from 2016-10-01 through 2017-03-14.

There are some exceptions to the October 1 constraint for certain data sources. See the "Independent water years" section for those exceptions.

Can I upload trained model weights and/or precomputed features for the code execution run?

Model training (fitting model weights and/or feature parameters) on the training set is expected to happen offline on your own hardware, and you should upload your trained model weights and feature parameters along with your code. Preprocessing features and computing predictions on the test set is expected to happen in the code execution run. If you need to download data associated with the test set from direct-API-access-approved data sources, you should do so within the code execution run.

  • Should upload: Model weights computed on the training set.
  • Should upload: Feature parameters computed on the training set (e.g., mean value of some variable over the water years of the training set).
  • Should not upload: Features computed from data corresponding to a water year in the test set.
  • Should not upload: Data from approved sources corresponding to a water year in the test set.

How do I authenticate with Copernicus Climate Data Store (CDS) in order to download data?

The Copernicus Climate Data Store (CDS) is an approved for certain feature data sources. CDS is free but requires you to create an account and authenticate with an API key. In order to download data from CDS during the code execution run, you will need to provide your own API key as part of your submission. Only DrivenData staff will be able to access your submission contents, and we will only use your API key as part of running your submission. If you have any further questions or concerns, please let us know.

In order to properly authenticate the cdsapi client in your code, see this forum post for a few different approaches.

Good luck

Good luck and happy hindcasting! If you have any questions, you can always visit the competition forum!