MagNet: Model the Geomagnetic Field Hosted By NOAA

competition
complete
$30,000

Woohoo! This competition has come to a close!

Many thanks to the participants for all of their hard work and commitment to using data for good!

Code submission format

In a typical competition, you would craft your algorithms and generate outputs for the evaluation dataset on your local machine. Then you would submit the output to the competition for scoring.

For this competition, you'll submit your model files with the code to make predictions, and we will generate outputs for the evaluation dataset in a containerized runtime in our cloud environment. The runtime repository contains the complete specification for the runtime.

What to submit

Your final submission should be a zip archive named with the extension .zip (for example, submission.zip). The root level of the submission.zip file must contain a predict.py which implements a function predict_dst that can take up to seven days worth of data and make a prediction for the current hour t and the hour after that t+1 as follows:

def predict_dst(
    solar_wind_7d: pd.DataFrame,
    satellite_positions_7d: pd.DataFrame,
    latest_sunspot_number: float,
) -> Tuple[float, float]:
    """
    Take all of the data up until time t-1, and then make predictions for
    times t and t+1.

    Parameters
    ----------
    solar_wind_7d: pd.DataFrame
        The last 7 days of satellite data up until (t - 1) minutes [exclusive of t]
    satellite_positions_7d: pd.DataFrame
        The last 7 days of satellite position data up until the present time [inclusive of t]
    latest_sunspot_number: float
        The latest monthly sunspot number (SSN) to be available

    Returns
    -------
    predictions : Tuple[float, float]
        A tuple of two predictions, for (t and t + 1 hour) respectively; these should
        be between -2,000 and 500.
    """

    ########################################################################
    #                         YOUR CODE HERE!                              #
    ########################################################################

    # this is a naive baseline where we just guess the training data mean every time
    prediction_at_t0 = -12
    prediction_at_t1 = -12

    return prediction_at_t0, prediction_at_t1

This function will be called by our main loop (see main.py) many times with seven days of data at a time. We do not guarantee any particular order and we expect that you will not try to maintain state between calls. Making fast predictions on a small subset of past data is an explicit design goal of this challenge.

Note: Your predictions must be in the physically plausible region between -2,000 and 500. Predictions outside this region will cause your submission to be rejected.

Your code will not have network access so you should also package up any necessary resources. Your function may load model artifacts, call into other Python files, and use other resources you have packaged into the zipped submission. You may not load the data files in /data directly.

The data that gets passed to your predict_dst function is identical to the data described in the problem description, but limited to the seven days leading up to a prediction time. Here is what these look like assuming that you are at timedelta 44 days 00:00:00 and making a Dst prediction for t=44 days 00:00:00 (now) and t+1=44 days 01:00:00 (one hour from now):

solar_wind_7d

The solar wind data is provided per minute, so each of the seven day dataframes will have 10,080 rows like this:

bx_gse by_gse bz_gse theta_gse phi_gse bx_gsm by_gsm bz_gsm theta_gsm phi_gsm bt density speed temperature source
timedelta
37 days 00:00:00 -5.26 2.45 1.62 15.46 155.34 -5.26 2.45 1.62 15.46 155.34 6.12 3.65 353.56 119329.0 ac
37 days 00:01:00 -5.38 2.23 1.90 17.82 157.79 -5.38 2.23 1.90 17.82 157.79 6.21 3.92 354.05 103905.0 ac
37 days 00:02:00 -5.31 1.85 1.94 18.73 161.07 -5.31 1.85 1.94 18.73 161.07 6.05 4.18 353.87 102326.0 ac
37 days 00:03:00 -5.25 1.64 1.90 18.86 162.75 -5.25 1.64 1.90 18.86 162.75 5.88 4.15 350.32 109681.0 ac
43 days 23:57:00 -6.43 0.41 2.27 19.13 176.29 -6.43 0.41 2.27 19.13 176.29 6.93 2.78 393.93 30021.0 ac
43 days 23:58:00 -6.49 0.37 2.29 19.06 176.73 -6.49 0.37 2.29 19.06 176.73 6.98 3.00 394.35 27075.0 ac
43 days 23:59:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

As you can see, some of these may be missing (as seen in the last row). You will have to choose a sensible way to handle missing (NaN) values.

satellite_positions_7d (pandas DataFrame)

This is daily data, so you will get seven rows:

gse_x_ace gse_y_ace gse_z_ace gse_x_dscovr gse_y_dscovr gse_z_dscovr
timedelta
38 days 1544159.2 -162085.4 86051.1 NaN NaN NaN
39 days 1543593.2 -169941.3 75850.4 NaN NaN NaN
40 days 1542170.1 -175305.5 71280.5 NaN NaN NaN
41 days 1540515.5 -180435.9 66649.7 NaN NaN NaN
42 days 1538486.4 -185278.1 61941.0 NaN NaN NaN
43 days 1536138.6 -189933.8 57176.8 NaN NaN NaN
44 days 1533530.2 -194457.0 52370.3 NaN NaN NaN

latest_sunspot_number (float)

Since these SSNs come only once per month, you will simply get the latest one, e.g. 76.9.

For more detail on how to create and test your submission, visit the runtime repository.

Runtime

Your code is executed within a container that is defined in our runtime repository. The limits are as follows:

  • Your submission must be written in Python (3.8.5) and use the packages defined in the runtime repository.
  • Your code may not read the files in /data directly. Doing so is grounds for disqualification. Instead, you will implement a function as described above. Using I/O or global variables to pass information between calls, or other attempts to circumvent the setup of this prediction challenge are grounds for disqualification. If in doubt whether something like this is okay, you may email us or post on the forum.
  • The submission must complete execution in 8 hours or less, and no single prediction can take more than 30 seconds (we expect each prediction to take far shorter).
  • The container has access to 4 vCPUs and 14GB RAM. There are no GPUs available.
  • The container will not have network access. All necessary files (code and model assets) must be included in your submission.
  • The container execution will not have root access to the filesystem.

The cluster for executing your code is a shared resource across participants. We request you are conscientious in your use of them. Please add progress information to your logs and cancel jobs that will run longer than the time limit. Canceled jobs won't count against your submission limit, and this means more available resources to score submissions that will complete on time.

Requesting package installations

Since the Docker container will not have network access, all packages must be pre-installed. We are happy to consider additional packages as long as they are approved under operational constraints by the challenge organizers, do not conflict and can build successfully. Packages must be available through conda for Python 3.8.5. To request an additional package be added to the docker image, follow the instructions in the runtime repository.

Happy building! Once again, if you have any questions or issues you can always head on over the user forum!