Wind-dependent Variables: Predict Wind Speeds of Tropical Storms Hosted By Radiant Earth Foundation

1 week left
$13,000

Problem description


In this challenge, you will estimate the wind_speed of a storm in knots at a given point in time using satellite imagery. The training data consist of single-band satellite images from 494 different storms in the Atlantic and East Pacific Oceans, along with their corresponding wind speeds. These images are captured at various times throughout the life cycle of each storm. Your goal is to build a model that outputs the wind speed associated with each image in the test set.

For each storm in the training and test sets, you are given a time-series of images with their associated relative time since the beginning of the storm. Your models may take advantage of the temporal data provided for each storm up to the point of prediction. Keep in mind that the goal of this competition is to produce an operational model that uses recent images to estimate future wind speeds.

Therefore, note that your solution may not use images captured later in a storm to estimate the wind speeds of images captured earlier in a storm. For storms represented in both the training and test sets, the test set images always temporally succeed the training set images and should never be used for training.

Features


Images

The features in this dataset are the images themselves. Each image is 366 x 366 pixels. The train set consists of 70,257 images and the test set consists of 44,377 images. The images were captured by Geostationary Operational Environmental Satellites (GOES) positioned in a geostationary orbit around the Earth to be able to capture images at high temporal frequency. The data included in this competition is from band #13 (10.3 microns), which is a long-wave infrared frequency. Clouds have high brightness at this frequency, which helps to better capture the spatial structure of a storm and is key to estimating wind speed.

You can access the training and test images by either downloading the zip archives from the data download page or by using the Radiant MLHub API. For example code on using the API, check out this starter notebook.

The images are named {image_id}.jpg. The image IDs are composed of {storm_id}_{image_number}, where storm_id is a unique three letter code and image_number reflects the sequential ordering of images throughout that storm.

For example, if you had the train_images directory in data/, then listing the first few examples would give:

$ ls data/train_images | head -n 5
abs_000.jpg
abs_001.jpg
abs_002.jpg
abs_003.jpg
abs_004.jpg

Here are three examples of storm images in the train set:

Image Example 1 Image Example 2 Image Example 3

Metadata

In addition to the images, you are provided with metadata CSVs, one for the train set and one for the test set.

training_set_features.csv and test_set_features.csv each contain four columns:

  • image_id (type: str): unique image identifier (corresponds to the jpg filename)
  • storm_id (type: str): unique storm identifier
  • relative_time (type: int): time in seconds since the beginning of the storm
  • ocean (type: categorical): ocean identifier. Possible values: 1, 2.

Labels


The labels are the windspeed for each image. training_set_labels.csv contains two columns:

  • image_id (type: str): image identifier
  • wind_speed (type: int): wind speed in knots

Performance metric


To measure your model’s performance, we’ll use a metric called Root Mean Square Error (RMSE), which is a measure of accuracy and quantifies differences between estimated and observed values. RMSE is the square root of the mean of the squared differences between the predicted values and the actual values. RMSE is always non-negative, with lower values indicating a better fit to the data.

Given the above definition, the metric RSME is

$$ RMSE = \sqrt{\frac{1}{N} \sum_{i=0}^N (y_i - \hat{y_i})^2 } $$

where

  • |$\hat{y_i}$| is the estimated wind speed
  • |$y_i$| is the actual wind speed
  • |$N$| is the number of samples

In Python, you can easily calculate RMSE using the following sci-kit learn function sklearn.metrics.mean_squared_error(y_true, y_pred, squared=False).

Submission format


The format for the submission file is a .csv with two columns: image_id and wind_speed. Predictions for wind_speed must be integers. For example, 1 is a valid integer, but 1.0 is not.

If you predicted a wind speed of 0 knots for the first five images, your predictions would look like the following:

image_id wind_speed
acd_123 0
acd_124 0
acd_125 0
acd_126 0
acd_127 0

And the first few lines of the .csv file that you submit would look like:

image_id,wind_speed
acd_123,0
acd_124,0
acd_125,0
acd_126,0
acd_127,0

You can see an example of the format that your submission must conform to, including headers and row names, in submission_format.csv. Keep in mind that a different subset of the test set may be used on the private leaderboard to discourage overfitting, so scores displayed on the public leaderboard while the competition is running may not be the same as the final scores on the private leaderboard.

Good luck!


If you're wondering how to get started, check out our benchmark blog post!

Good luck and enjoy this challenge! If you have any questions you can always visit the user forum.