Navigation

Problem description

The goal of this challenge is to develop models for forecasting Dst (Disturbance Storm-Time Index) that 1) push the boundary of predictive performance 2) under operationally viable constraints 3) using specified real-time solar-wind data feeds. More information on the dataset, performance metric, and submission specifications is provided below.

Finalists will be determined by performance on the private test set. These participants will then have the opportunity to submit their code to be audited using an out-of-sample verification set. The top 4 eligible teams that pass this final check will be awarded prizes.

Data
List of features
Labels

Performance metric
Root Mean Squared Error

Submission Format
Example

The features in this dataset

Overview

The input data for this challenge is composed of solar wind measurements collected from two satellites: NASA's Advanced Composition Explorer (ACE) and NOAA's Deep Space Climate Observatory (DSCOVR). Your goal is to predict the Disturbance Storm-Time Index (Dst), a measure of magnetic activity, from the provided data up to the time of prediction.

Dst values are measured by 4 ground-based observatories near the equator. These values are then averaged to provide a measurement of Dst for any given hour. However, these values are not always provided in a timely manner.

Thus, your task is to build a model that can predict Dst in real-time for both the current hour and the next hour. For example, if the current timestep is 10:00 am, you are must predict Dst for both 10:00 am and 11:00 am using data up until but not including 10:00 am.

alt-text

Forecast Dst solely from solar-wind observations at the Lagrangian (L1) position using satellite data from NOAA’s Deep Space Climate Observatory (DSCOVR) and NASA's Advanced Composition Explorer (ACE).

Note: This is a real-time prediction task. Therefore, your solution may not use data captured at the same or later in time to predict Dst, and it may not take Dst as an input.

To ensure similar distributions between the training and test data, the data is separated into three non-contiguous periods. All data are provided with a period and timedelta multi-index which indicates the relative timestep for each observation within a period, but not the real timestamp. The period identifiers and timedeltas are common across datasets.

Training Data

Three different time-series datasets are provided as features, in addition to the Dst labels:

filename	description	frequency
solar_wind.csv	Solar wind data collected from ACE and DSCOVR satellites	minutely
sunspots.csv	Smoothed sunspot counts	monthly
satellite_positions.csv	Coordinate positions for ACE and DSCOVR	daily
dst_labels.csv	Dst values averaged across the four stations	hourly

One more note about the training data: historical training data is provided to you, but the testing environment will be set-up to simulate a real-time environment. It is up to you to figure out how to align and use the appropriate training data, being careful not to leak any "future" information.

For details on the real-time data constraints, refer to the code submission page.

Solar Wind Data

The primary feature data are provided in solar_wind.csv. They are composed of solar-wind readings from the ACE and DSCOVR satellites:

bx_gse - Interplanetary-magnetic-field (IMF) X-component in geocentric solar ecliptic (GSE) coordinate (nanotesla (nT))
by_gse - Interplanetary-magnetic-field Y-component in GSE coordinate (nT)
bz_gse - Interplanetary-magnetic-field Z-component in GSE coordinate (nT)
theta_gse - Interplanetary-magnetic-field latitude in GSE coordinates (defined as the angle between the magnetic vector B and the ecliptic plane, being positive when B points North) (degrees)
phi_gse - Interplanetary-magnetic-field longitude in GSE coordinates (the angle between the projection of the IMF vector on the ecliptic and the Earth–Sun direction) (degrees)
bx_gsm - Interplanetary-magnetic-field X-component in geocentric solar magnetospheric (GSM) coordinate (nT)
by_gsm - Interplanetary-magnetic-field Y-component in GSM coordinate (nT)
bz_gsm - Interplanetary-magnetic-field Z-component in (GSM) coordinate (nT)
theta_gsm - Interplanetary-magnetic-field latitude in GSM coordinates (degrees)
phi_gsm - Interplanetary-magnetic-field longitude in GSM coordinates (degrees)
bt - Interplanetary-magnetic-field component magnitude (nT)
density - Solar wind proton density (N/cm^3)
speed - Solar wind bulk speed (km/s)
temperature - Solar wind ion temperature (Kelvin)
source - Starting in 2016, the solar wind data for any given point in time can be sourced from either DSCOVR or ACE satellites depending on the quality. "ac" denotes it was sourced from ACE, and "ds" from DSCOVR.

Example row:

column	value
period	train_a
timedelta	0 days 00:00:00
bx_gse	-5.55
by_gse	3.0
bz_gse	1.25
theta_gse	11.09
phi_gse	153.37
bx_gsm	-5.55
by_gsm	3.0
bz_gsm	1.25
theta_gsm	11.09
phi_gsm	153.37
bt	6.8
density	1.53
speed	383.92
temperature	110237.0
source	ac

In the code execution environment, you'll be provided one week of historical solar wind data per prediction.

alt-text

Illustrative example of solar-wind data features and Dst over time. The selected features in green (up to t0) are used to predict the Dst in cyan (at t0 and at t+1). Keep in mind that your solution should work on real-time data streams with realistic noise including sensor malfunctions and data gaps.

Satellite Coordinate Data

ACE and DSCOVR satellites are not stationary. They actually orbit around the L1 point, in a relatively constant position with respect to the Earth as the Earth revolves around the sun.

satellite_positions.csv records the daily positions of the DSCOVR and ACE Spacecrafts in Geocentric Solar Ecliptic (GSE) Coordinates for projections in the XY, XZ, and YZ planes. The columns for each spacecraft are denoted by the suffixes _ace or _dscovr.

gse_x - Position of the satellite in the X direction of GSE coordinates (km)
gse_y - Position of the satellite in the Y direction of GSE coordinates (km)
gse_z - Position of the satellite in the Z direction of GSE coordinates (km)

Example row:

period	timedelta	gse_x_ace	gse_y_ace	gse_z_ace	gse_x_dscovr	gse_y_dscovr	gse_z_dscovr
train_a	0 days 00:00:00	1522376.9	143704.6	149496.7	NaN	NaN	NaN

Note that some dates are missing for DSCOVR. In the code execution environment, you'll be provided the last 7 days of satellite positions.

Sunspot Numbers

The Sun exhibits a well-known, periodic variation in the number of spots on its disk over a period of about 11 years, called a solar cycle. In general, large geomagnetic storms occur more frequently during the peak of these cycles. Sunspot numbers might allow for calibration of models to the solar cycle.

Smoothed sunspot numbers are provided at a monthly frequency. Because sunspot numbers are reliably projected, the most current month will be provided for each prediction in the real-time environment.

Sunspots are indexed according to the first corresponding day in dst_labels.csv.

Example row:

period	timedelta	smoothed_ssn
train_a	0 days 00:00:00	65.4

Labels

The labels are hourly Dst values, indexed using the same period and timedelta multi-index. Your task is to predict the current timestep (t0) and the following timestep (t+1). Remember that you are not allowed to use historical Dst values as an input for prediction.

Example row:

period	timedelta	dst
train_a	0 days 00:00:00	-7

Performance metric

Performance is evaluated according to Root Mean Squared Error (RMSE). RMSE will be calculated on t0 and t+1 simultaneously.

RMSE is defined as:

$$ RMSE = \sqrt{\frac{1}{N} \sum_{i=0}^N (y_i - \hat{y_i})^2 } $$

where:

|$\hat{y_i}$| is the estimated Dst values for t0 and t+1
|$y_i$| is the recorded Dst for t0 and t+1
|$N$| is the number of samples

This metric is implemented in scikit-learn, with the squared parameter set to False.

Submission format

This is a code execution challenge! Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution.

The execution environment will be simulating real-time conditions, subject to data availability constraints. See complete details on making your executable code submission here.

Good luck!

Good luck and enjoy this problem! Check out the benchmark blog post for tips on how to get started. If you have any questions you can always visit the user forum.

MagNet: Model the Geomagnetic Field

Quick Facts

Participants

No. of Entries

Prize

Winner

Ammarali32

Navigation

Problem description

The features in this dataset

Overview

Training Data

Solar Wind Data

Satellite Coordinate Data

Sunspot Numbers

Labels

Performance metric

Submission format

Good luck!

On this page