Water Supply Forecast Rodeo: Hindcast Evaluation

Water managers in the Western U.S. rely on accurate water supply forecasts to better operate facilities and mitigate drought. Help the Bureau of Reclamation improve seasonal water supply estimates in this probabilistic forecasting challenge! [Hindcast Evaluation Arena] #climate

$50,000 in prizes
jan 2024
99 joined

Code submission format

In the Evaluation Arena for the Hindcast Stage (you are here!), you will submit both your trained model(s) and the code to make predictions. We will then use a containerized runtime in our cloud environment to generate predictions for the evaluation dataset. The resulting score will contribute to the criteria used for the final ranking.

In addition to the model code, solvers must also submit a model report that provides a detailed overview of their submission to be considered for prizes. Additional information on the evaluation criteria can be found in the Problem Description.

The runtime repository contains the complete specification for the runtime. If you want to learn more about how our code execution competitions work, check out our blog post for a peek behind the scenes.

What to submit

You will submit a ZIP archive containing everything needed to perform inference on the test set. At a minimum, this ZIP archive must contain a solution.py Python script that contains a predict function. The predict function should be able to produce a forecast for one site on one issue date and should match the following function signature:

def predict(
    site_id: str,
    issue_date: str,
    assets: dict[Hashable, Any],
    src_dir: Path,
    data_dir: Path,
    preprocessed_dir: Path,
) -> tuple[float, float, float]:
    """A function that generates a forecast for a single site on a single issue
    date. This function will be called for each site and each issue date in the
    test set.

    Args:
        site_id (str): the ID of the site being forecasted.
        issue_date (str): the issue date of the site being forecasted in
            'YYYY-MM-DD' format.
        assets (dict[Hashable, Any]): a dictionary of any assets that you may
            have loaded in the 'preprocess' function. See next section.
        src_dir (Path): path to the directory that your submission ZIP archive
            contents are unzipped to.
        data_dir (Path): path to the mounted data drive.
        preprocessed_dir (Path): path to a directory where you can save any
            intermediate outputs for later use.
    Returns:
        tuple[float, float, float]: forecasted values for the seasonal water
            supply. The three values should be (0.10 quantile, 0.50 quantile,
            0.90 quantile).
    """
    return 0.0, 0.0, 0.0

The runtime will have supervisor code that calls your predict function to generate forecasts. The supervisor code will be responsible for compiling your predictions so that it matches the submission format, and you should not need to do anything further. The submission format will be a CSV with the same format as the ones submitted in the Development Arena. You can refer to the Problem Description for more details on the predictions format.

Optional: preprocess function

In your solution.py, you can optionally include a function named preprocess that matches the function signature further below. This function will be called once before the loop calling your predict function, and it is intended for you to do any setup that you'd like. Some possible examples include:

  • Downloading additional data that you need for data sources approved for direct API access
  • Preprocess feature data and writing intermediate outputs to the preprocessed directory
  • Loading assets, such as model weights, that you intend to use across predictions
def preprocess(
    src_dir: Path, data_dir: Path, preprocessed_dir: Path
) -> dict[Hashable, Any]:
    """An optional function that performs setup or processing.

    Args:
        src_dir (Path): path to the directory that your submission ZIP archive
            contents are unzipped to.
        data_dir (Path): path to the mounted data drive.
        preprocessed_dir (Path): path to a directory where you can save any
            intermediate outputs for later use.

    Returns:
        (dict[Hashable, Any]): a dictionary containing any assets you want to
            hold in memory that will be passed to to your 'predict' function as
            the keyword argument 'assets'.
    """
    return {}

If you need to save any intermediate outputs, you should write them to the provided preprocessed_dir path. This is a directory that you will have write permissions to use.

For anything that you're planning to load and hold in memory, you should add to the dictionary that is returned by the preprocess function. This dictionary will be passed into your predict function as assets.

Submission types

There are two types of submissions: smoke tests and normal submissions.

  • Smoke tests are intended for you to test the correctness of your code and will run on only a subset of the test data in order to run more quickly. Smoke test results will not be considered in your Hindcast Stage evaluation. Smoke tests have a separate and more permissive submission limit.
  • Normal submissions will be run on the full test set. You will be allowed up to 3 normal submissions, from which you will choose 1 to be the final submission to be considered in your Hindcast Stage evaluation.

Use the toggle shown on the submission form to select the type of submission. Please see the following two sections for additional details.


Screenshot of code submission form.
Submission form showing the "Normal submission" and "Smoke test" toggle.

Smoke tests

We encourage you to make full use of smoke tests instead of full normal submissions when starting out. The code execution cluster is a shared resource and we ask that you be mindful in your use. To make a smoke test, be sure to select the "Smoke test" option on the submission form where you upload your code.

To reproduce a smoke test locally, you can use the smoke_submission_format.csv file found on the data download page. See instructions in the runtime repository.

The smoke test submission consists of forecasts for the following three sites and forecast years:

  • hungry_horse_reservoir_inflow, san_joaquin_river_millerton_reservoir, skagit_ross_reservoir
  • 2005, 2013, 2023

Normal submissions

Updated: December 11, 2023

To make a normal submission, select the "Normal submission" option on the submission form where you upload your code. You are allowed up to 3 total successful normal submissions, in case of any issues or mistakes. You should not use repeated normal submissions in the Evaluation Arena to iterate on your solution; instead, you should submit prediction CSV files to the Development Arena. Submissions in the Evaluation Arena and Development Arena are evaluated against the same ground truth test set data.

If you find yourself requiring additional submissions beyond this limit, please contact us in the challenge forum. Note that additional submissions are at the organizers' discretion and may not be granted if you are misusing Evaluation Arena submissions.

Only one submission will count as the official final code submission for the Hindcast Stage evaluation. You should ensure that your intended final submission is checked in the "Your submissions" table on the Submissions page by the submission deadline. If your intended final submission was made before the deadline but did not finish executing until after the deadline, please contact us.


Screenshot of 'Your submissions' table on the submissions page.
Use the checkboxes on the right side of the "Your submissions" table on the Submissions page to select your final submission for evaluation.

Directory Structure

The filesystem inside the runtime will look something like this:

/code_execution/  # Working directory during run
├── data/             # Data will be mounted here
├── preprocessed/     # Directory you can use for intermediate outputs
├── submission/       # Final predictions will be written here
├── src/              # Your submission will be unzipped to here
└── supervisor.py     # The runtime will run this script which imports your code

The runtime will use /code_execution as the active working directory.

Data will be mounted to /code_execution/data. This will include both files from the data download page relevant to the test set, as well the rehosted feature data from the approved data sources. This directory will be read-only.

You will be able to write any intermediate outputs instead to the code_execution/preprocessed directory. If you need to load anything that you included in your submitted ZIP archive, it will be available inside the code_execution/src directory.

Both the predict function and preprocessed function will be passed pathlib.Path objects called src_dir, data_dir, and preprocessed_dir that point to their respective directories. You can use these Path objects instead of hard-coding file paths.

Runtime hardware and constraints

Updated December 18, 2023 to increase time limit for submissions.

The runtime container will be run on a cloud virtual machine with the following specifications:

  • 4 vCPUs
  • 28 GiB memory
  • 180 GiB temporary storage
  • 1 GPU (Nvidia Tesla T4)
  • 16 GiB GPU memory

Normal submissions must complete execution in 4 hours or less. This limit is especially helpful to address non-terminating code, and we expect most submissions to complete more quickly. If you find yourself requiring more time than this limit allows, please contact us in the challenge forum.

Smoke tests will have a shorter time limit of 30 minutes. Please be aware that this is proportionally more time than would be available for the full regular submission, and plan accordingly.

The machines for executing your code are a shared resource across competitors. Please be conscientious in your use of them. Please add progress information to your logs and cancel jobs that will run longer than the time limit. Canceled jobs won't count against your submission limit, and this means more available resources to score submissions that will complete on time.

Runtime environment

The specifications for the container image is provided in the competition runtime repository. We will use conda as the environment and dependency manager. Dependencies are fully resolved using conda-lock and the lockfiles are provided in the repository.

The runtime environment will be Python 3.10.13 and already includes many common machine learning and geospatial data packages. If you need additional package dependencies for your solution, please see the section below.

Requesting package installations

Since the runtime container will only have limited network access for specific data sources, all packages must be pre-installed. We are happy to add packages as long as they do not conflict and can build successfully. Packages must be available through conda-forge (preferred) or PyPI for Python 3.10.13. To request additional packages be added to the runtime image, follow the instructions in the runtime repository.


Happy building! Once again, if you have any questions or issues you can always head on over the user forum!