Water Supply Forecast Rodeo: Forecast Stage

[submissions closed] Water managers in the Western U.S. rely on accurate water supply forecasts to better operate facilities and mitigate drought. Help the Bureau of Reclamation improve seasonal water supply estimates in this probabilistic forecasting challenge! [Forecast Stage] #climate

$50,000 in prizes
9 weeks left
47 joined

Code submission format

In the Forecast Stage, you will submit a ZIP archive containing your trained model and the code to make predictions. The format matches what you were required to submit for the Hindcast Stage Evaluation Arena, and the same submission should generally be able to run successfully and satisfy requirements. (However, you are welcome to retrain your model on the additional training data and/or make other updates, as long as you track the changes so that you can still provide training code to reproduce the Hindcast Stage version of your model.)

The runtime repository contains the complete specification for the runtime. If you want to learn more about how our code execution competitions work, check out our blog post for a peek behind the scenes.

How submissions work

A submission refers to a set of predictions generated by a model. The Forecast Stage is a code execution competition, where the submission will always be generated by an associated code job—the remote execution of your code on DrivenData's cluster. In the Forecast Stage, there will be two types of submissions / code jobs:

  • User-uploaded submission: you have uploaded new code, which will immediately put a code job into the queue.
  • Admin-scheduled submission: DrivenData will schedule a code job on your behalf without any action from you. It will copy the code from the most recent previous successful code job.

Both types will show up as entries in the "Your code jobs" and "Your submissions" tables. Predictions for the latest successful submission—no matter the type—will appear on the leaderboard.

During the Forecast Stage, both types of submissions will always run to generate predictions for only a single issue date. At any time, there will be one active issue date that all submissions will run against. Initially, the active issue date will be 2023-01-01 (this past January) to emulate the 2024 issue dates. On January 1, 2024, we will cut over the active issue date to 2024-01-01, and similarly on January 8 to 2024-01-08, and so on.

During the open submission period through January 11, 2024:

  • You will be able to make user-uploaded submissions freely
  • There will be two admin-scheduled submissions

During the evaluation period beginning January 15, 2024:

  • Admin-scheduled submissions will happen on the designated issue dates four times each month
  • You will not be able to make user-uploaded submissions unless you are fixing a failed submission

See the following sections for additional details.

Open submission period

The open submission period runs through January 11, 2024 at 11:59 pm UTC. During the open submission period, you may submit code as many times as you'd like (subject to a daily limit shown on the submissions page). The latest successful code submission at the deadline will be used as your official Forecast Stage submission. If you are happy with the latest successful code submission, there is no reason to continue resubmitting the same thing. You only need to resubmit if you intend to change your code or model.

There will be two admin-scheduled submissions that overlap with the open submission period—January 1, 2024 and January 8, 2024. Each will only happen if there is at least one successful submission beforehand. These two issue dates are trial runs and will be excluded from your final Forecast Stage score. You are encouraged to submit early so that you can use these admin-scheduled runs to verify that things are working as expected.

Here are some scenarios illustrating what might happen:

Scenario A Scenario B Scenario C
  • Dec 30: You submit model v1. It runs for issue date 2023-01-01 successfully.
  • Dec 31: You submit model v2. It runs for issue date 2023-01-01 but fails.
  • Jan 1: Admin-scheduled job uses model v1. It runs for issue date 2024-01-01 successfully.
  • Jan 8: Admin-scheduled job uses model v1. It runs for issue date 2024-01-08 successfully.
  • Dec 30: You submit model v1. It runs for issue date 2023-01-01 successfully.
  • Jan 1: Admin-scheduled job uses model v1. It runs for issue date 2024-01-01 successfully.
  • Jan 4: You submit model v2. It runs for issue date 2024-01-01 successfully.
  • Jan 8: Admin-scheduled job uses model v2. It runs for issue date 2024-01-08 successfully.
  • Jan 1: No admin-scheduled job because no prior submission.
  • Jan 4: You submit model v1. It runs for issue date 2024-01-01 successfully.
  • Jan 8: Admin-scheduled job uses model v1. It runs for issue date 2024-01-08 successfully.

Diagnostic public "score"

Since ground truth data will not be available until August 14, 2024, your predictions will be validated for the correct format but not actually scored. Instead, the primary metric shown in the "score" field is the issue date of that submission in YYYYMMDD format. The leaderboard will show the issue date corresponding to your latest successful submission.

Evaluation period

Beginning with 2024-01-15 as the first issue date that is included in your final score, DrivenData will run admin-scheduled submissions on each issue date through 2024-07-22. This will happen automatically without any action required from you.

Modifications and job failures

Under normal circumstances, you will not be able to modify your code or model during the evaluation period.

If a scheduled job fails, DrivenData will send your team an email notifying you of the failure. You will then have the opportunity to make an updated code submission specifically to address the cause of failure.

When submitting updated code, you are only allowed to fix correctness issues that affect successful code execution. You are not allowed to make substantive changes to your model that affect the quality of your predictions. Changes will be reviewed and any violations will lead to disqualification. Challenge organizers will have final say in determining if changes are permissible. If you have any questions, please ask by email or in the challenge forum.

Some examples of permitted and not permitted changes:

  • Permitted: Updating your data processing code to fix a runtime error if some specific piece of data is not available.
  • Permitted: Updating your data processing code to handle a change in data format for some data source.
  • Permitted: Adding retries to data download code that intermittently fails.
  • Forbidden: Updating trained model weights.
  • Forbidden: Adding or removing features from your model.
  • Forbidden: Substantively changing how a feature is calculated.

You must also include a CHANGELOG.md markdown file in your submission ZIP that cumulatively documents your changes per the following guidelines:

  • Entries should be dated by the issue date that it comes into effect (i.e., the issue date that you are submitting a fix for).
  • Clearly and thoroughly explain what change you are making and why.

You will have until the following issue date to submit your updated code (approximately 1 week). Fixes can only be submitted for the most recent failed issue date. For any failed issue dates that do not get fixed, the most recent prior successful predictions will be filled forward when calculating your final score. During the forecast stage, you have one allowance to address a missed fix for a past issue date—please contact us at info@drivendata.org.

Feature data

A mounted data drive will be available in the code execution runtime in the same way as with the Hindcast Evaluation Arena. DrivenData will run the code from data_download/ in the data and runtime repository to populate the mounted drive for that issue date. All participants' code jobs for a given issue date will have access to the same mounted data drive.

For any data sources where direct API access is permitted and you are downloading data, you are responsible for correctly subsetting the available data based on the issue date. Specifically, your model should only use data through the day before the issue date. In general, code jobs may be run after the issue date, and you should do your best to ensure that your model uses the appropriate data. Failure to do so or deliberate attempts to gain an unfair advantage will lead to disqualification.

What to submit

The submission format matches that of the Hindcast Evaluation Arena. You will submit a ZIP archive containing everything needed to perform inference for a given issue date. At a minimum, this ZIP archive must contain a solution.py Python script that contains a predict function. The predict function should be able to produce a forecast for one site on one issue date and should match the following function signature:

def predict(
    site_id: str,
    issue_date: str,
    assets: dict[Hashable, Any],
    src_dir: Path,
    data_dir: Path,
    preprocessed_dir: Path,
) -> tuple[float, float, float]:
    """A function that generates a forecast for a single site on a single issue
    date. This function will be called for each site and each issue date in the
    test set.

    Args:
        site_id (str): the ID of the site being forecasted.
        issue_date (str): the issue date of the site being forecasted in
            'YYYY-MM-DD' format.
        assets (dict[Hashable, Any]): a dictionary of any assets that you may
            have loaded in the 'preprocess' function. See next section.
        src_dir (Path): path to the directory that your submission ZIP archive
            contents are unzipped to.
        data_dir (Path): path to the mounted data drive.
        preprocessed_dir (Path): path to a directory where you can save any
            intermediate outputs for later use.
    Returns:
        tuple[float, float, float]: forecasted values for the seasonal water
            supply. The three values should be (0.10 quantile, 0.50 quantile,
            0.90 quantile).
    """
    return 0.0, 0.0, 0.0

The runtime will have supervisor code that calls your predict function to generate forecasts. The supervisor code will be responsible for compiling your predictions so that it matches the submission format, and you should not need to do anything further. The run will output a CSV file as detailed in the problem description.

Optional: preprocess function

In your solution.py, you can optionally include a function named preprocess that matches the function signature further below. This function will be called once before the loop calling your predict function, and it is intended for you to do any setup that you'd like. Some possible examples include:

  • Downloading additional data that you need for data sources approved for direct API access
  • Preprocess feature data and writing intermediate outputs to the preprocessed directory
  • Loading assets, such as model weights, that you intend to use across predictions
def preprocess(
    src_dir: Path, data_dir: Path, preprocessed_dir: Path
) -> dict[Hashable, Any]:
    """An optional function that performs setup or processing.

    Args:
        src_dir (Path): path to the directory that your submission ZIP archive
            contents are unzipped to.
        data_dir (Path): path to the mounted data drive.
        preprocessed_dir (Path): path to a directory where you can save any
            intermediate outputs for later use.

    Returns:
        (dict[Hashable, Any]): a dictionary containing any assets you want to
            hold in memory that will be passed to to your 'predict' function as
            the keyword argument 'assets'.
    """
    return {}

To save intermediate outputs, you should write to the provided preprocessed_dir path—a directory with write permissions for your use.

For anything that you're planning to load and hold in memory, you should add to the dictionary that is returned by the preprocess function. This dictionary will be passed into your predict function as assets.

Directory Structure

The filesystem inside the runtime will be the same as the Hindcast Evaluation Arena. It will look something like this:

/code_execution/  # Working directory during run
├── data/             # Data will be mounted here
├── preprocessed/     # Directory you can use for intermediate outputs
├── submission/       # Final predictions will be written here
├── src/              # Your submission will be unzipped to here
└── supervisor.py     # The runtime will run this script which imports your code

The runtime will use /code_execution as the active working directory.

Data will be mounted to /code_execution/data. This will include both files from the data download page relevant to the test set, as well the rehosted feature data from the approved data sources. This directory will be read-only.

You will be able to write any intermediate outputs instead to the code_execution/preprocessed directory. If you need to load anything that you included in your submitted ZIP archive, it will be available inside the code_execution/src directory.

Both the predict function and preprocessed function will be passed pathlib.Path objects called src_dir, data_dir, and preprocessed_dir that point to their respective directories. You can use these Path objects instead of hard-coding file paths.

Accessing mounted data for debugging

Added January 4, 2024

In order to help you debug your submissions, we are making the mounted data containers accessible to you with read-only access during the open submission period. You can find access credentials on the data download page as mounted_data_credentials.csv. This data is stored using the Azure Blob Storage object storage service.

The credentials CSV file contains the following columns:

  • issue_date — issue date that this data container is for
  • credentials_expire — when the credentials expire in UTC
  • base_uri — base URI, see notes on syntax below
  • container_name — container name, see notes on syntax below
  • sas_token — credential that authenticates access, see notes on syntax below

Files in Azure Blob Storage are identified by a URI that is in the form of a URL, for example: https://someaccount.blob.core.windows.net/somecontainer/somedirectory/somefile.csv. The documentation from Azure is available here. A container is a collection of files, and for this competition, we have separate containers for each issue date. To get the URI for a given issue date's container, you should concatenate the row's base_uri with container_name joined by a forward slash:

"{base_uri}/{container_name}"

The credentials to authenticate access is called a shared access signature (SAS) token and is specific to each container. It should be appended to the end of a resource URI as a query string with a question mark, e.g.,

"{base_uri}/{containername}?{sas_token}"

Here are some example shell commands using the azcopy command-line utility that may be useful:

# Set these variables with real values from CSV
# Make sure they are quoted to escape special characters
export DATA_BASE_URI="https://someaccount.blob.core.windows.net"
export DATA_CONTAINER_NAME="some-container-name"
export DATA_SAS_TOKEN="si=participants&spr=https&sv=2023-01-01&sr=c&sig=000000000000000000000000000000000000000000000000"

# To list contents of the container:
azcopy list $DATA_BASE_URI/$DATA_CONTAINER_NAME?$DATA_SAS_TOKEN

# To download everything in the container to a directory named `data/` in your current directory:
azcopy cp $DATA_BASE_URI/$DATA_CONTAINER_NAME?$DATA_SAS_TOKEN ./data/ --recursive --as-subdir=false

# To download a specific file `snotel/FY2023/1005_CO_SNTL.csv`:
azcopy cp $DATA_BASE_URI/$DATA_CONTAINER_NAME/snotel/FY2023/1005_CO_SNTL.csv?$DATA_SAS_TOKEN ./data/snotel/FY2023/1005_CO_SNTL.csv

# To download a directory `snotel/` recursively:
azcopy cp $DATA_BASE_URI/$DATA_CONTAINER_NAME/snotel/?$DATA_SAS_TOKEN ./data/snotel/ --recursive --as-subdir=false

If you prefer to use Python to inspect or download the data, check out cloudpathlib, an open-source library that we've developed at DrivenData.

Runtime hardware and constraints

The runtime hardware is the same as the Hindcast Evaluation Arena. Your job will be run in a container on a cloud virtual machine with the following specifications:

  • 4 vCPUs
  • 28 GiB memory
  • 180 GiB temporary storage
  • 1 GPU (Nvidia Tesla T4)
  • 16 GiB GPU memory

The submission must complete execution in 30 minutes or less. This limit is especially helpful to address non-terminating code, and we expect most submissions to complete more quickly. If you find yourself requiring more time than this limit allows, please contact us in the challenge forum.

The machines for executing your code are a shared resource across competitors. During the open submission period, please be conscientious in your use of them. Please add progress information to your logs and cancel jobs that will run longer than the time limit. Canceled jobs won't count against your daily submission limit, and this means more available resources to score submissions that will complete on time.

Runtime environment

The container image is the same one from the Hindcast Evaluation Arena. The specifications for the container image are provided in the competition runtime repository. We are using conda as the environment and dependency manager. Dependencies are fully resolved using conda-lock and the lockfiles that fully list all packages are provided in the repository. The runtime environment has Python 3.10.13 and R 4.3.2.

Requesting package installations

In general, the dependencies you need for your solution should already be present from the Hindcast Stage. If you need to request additional packages, please follow the same instructions in the runtime repository. We are happy to add packages as long as they do not conflict and can build successfully. You should submit your requests by January 5, 2024 to ensure we have time to review before the open submission deadline.


Happy building! Once again, if you have any questions or issues you can always head on over the user forum!