Problem description
In this challenge, you will predict the pushback time of departing flights (how long until the plane departs from the gate) using features that capture air traffic and weather conditions. Your goal is to build a model that predicts the minutes until pushback time. It is useful to predict pushback time at various points before a plane’s scheduled departure, so your model will make predictions from roughly an hour before scheduled pushback until the actual pushback time. For each flight, you'll generate predictions based on the information available at a variety of different times leading up to the actual pushback. Each prediction is a unique combination of flight ID, airport, and time (here we'll call this prediction time).
Finalists from Phase 1 will be required to participate in Phase 2, during which they will work with NASA to train a federated version of their model. It is easier to combine weights for certain types of models, and therefore some types of models are more easily federated than others. Keep this in mind as you develop your solutions!
- Performance metric
- Mean Absolute Error
Timeline and leaderboard
All participants can enter the Open Arena of this challenge (you are here!). In the Open Arena, participants can work on their solutions and get live feedback from a public leaderboard. Participants who attest to their eligibility can enter the Prescreened Arena where participants will submit executable code submissions that determine the final rankings.
Phase 1: Open model development |
|
Phase 1: Code execution (Prescreened only) |
|
Phase 2: Federated learning (Phase 1 finalists only) |
|
Phase 1: Open model development | Phase 1: Code execution (Prescreened only) | Phase 2: Federated learning (Phase 1 finalists Only) |
---|---|---|
Feb 1 - Apr 17, 2023 | Feb 22 - Apr 17, 2023 | Jun 1 - Aug 31, 2023 |
Submit pushback predictions for the validation set to the Open Arena. Scores are displayed on the Open Arena public leaderboard. | Submit code to the Prescreened Arena, which we will execute to compute pushback predictions for the test set. These scores are displayed on the Prescreened public leaderboard and will be used to determine prize rankings. | Finalists from Phase 1 will train a federated version of their model using NASA's federated learning platform. |
Phase 1: Model development
Upon registering for the contest, participants will enter the Open Arena (you are here!). The Open Arena provides access to a training data set including 10 airports over approximately two years, as well as feature data for a withheld validation set sampled from the same time range.
Throughout this phase, participants can submit and score predicted pushback times to get feedback on their model’s performance. Once a participant submits predictions according to the Submission Format, those predictions will be compared to the ground truth and the score (mean absolute error) will be shown on the public leaderboard.
Pre-screening: In order to be eligible for the Prescreened Arena, participants must submit an attestation that they are eligible to participate according to the rules. Finalists from the Prescreened Arena will have to submit proof of eligibility.
Once successfully prescreened, participants can:
- Enter the Prescreened Arena.
- Make executable code submissions to the containerized test harness, which will execute submissions on the Prescreened Arena test data and produce scores for the Prescreened public leaderboard. The Prescreened private leaderboard determines final ranking. More detailed instructions will be provided in the Prescreened Arena.
Solutions in the Prescreened Arena may use both training and validation data from the Open Arena as training data for submissions. The test harness will execute solutions on a disjoint sample with a similar distribution as the Open Arena.
Phase 2: Federated Learning
Finalists from Phase 1 will be required to train a federated version of their models in Phase 2. Finalists will use a model-agnostic federated learning platform to re-train their model. Some features will remain centralized, while others will be partitioned by airline. Successfully federated models will be able to aggregate model weights from an arbitrary number of partitions. Read on for more detail about the competition's federated learning requirements.
The goal of Phase 2 is to explore federated learning techniques that enable independent clients (also known as parties, organizations, groups) to jointly train a global model without sharing or pooling data. In this case, the clients are airlines. Federated learning is a perfect match for the problem of pushback time prediction because airlines collect a lot of information that is relevant to pushback time, but too valuable or sensitive to share, like the number of passengers that have checked in for a flight or the number of bags that have been loaded onto a plane. Federated learning enables airlines to safely contribute the valuable and sensitive data they collect towards a centralized model. Each finalist from Phase 1 will be invited to translate their winning Phase 1 model into a model that can be trained in a federated manner. The data used in Phase 2 will be the the same data as used in Phase 1, but it will be divided into:
- "Public" or "non-federated" variables for which all rows are available to all airlines
- "Private" or "federated" variables for which only rows corresponding to a flight that an airline operates are available to that airline
The exact variables airlines want to protect by federating is by definition too sensitive to release for a competition. Since we don't have access to the actual airline data that would be federated, we'll simulate the federated scenario by treating a subset of variables as if they are private. All of the federated variables for Phase 2 come from the Phase 1 mfs
and standtimes
datasets:
Federated mfs variables
aircraft_engine_class
aircraft_type
major_carrier
flight_type
Federated standtimes variables
departure_stand_actual_time
Each airline will have access to its federated variables and all non-federated variables. Airlines will not have access to federated variables from other airlines. In Phase 2, these data constraints apply during training and prediction.
Below is an example of airline-specific federated variables using a sample of the KMEM_mfs.csv.bz2
file:
gufi | aircraft _engine _class | aircraft _type | major _carrier | flight_type | isdeparture |
AAL1007.DFW.MEM.211224.0015.0120.TFM | JET | B738 | AAL | SCHEDULED_AIR_TRANSPORT | False |
AAL1007.DFW.MEM.211231.0015.0066.TFM | JET | B738 | AAL | SCHEDULED_AIR_TRANSPORT | False |
AAL1010.DFW.STL.210831.0153.0171.TFM | JET | A320 | AAL | SCHEDULED_AIR_TRANSPORT | False |
AAL1017.DFW.MEM.220205.0035.0087.TFM | JET | A320 | AAL | SCHEDULED_AIR_TRANSPORT | False |
AAL1017.DFW.MEM.220207.0035.0088.TFM | JET | A320 | AAL | SCHEDULED_AIR_TRANSPORT | False |
AAL1017.DFW.MEM.220208.0035.0135.TFM | JET | A320 | AAL | SCHEDULED_AIR_TRANSPORT | False |
UAL2477.MEM.EWR.220710.1915.0167.TFM | JET | A320 | UAL | SCHEDULED_AIR_TRANSPORT | True |
UAL2477.MEM.EWR.220711.1915.0070.TFM | JET | A319 | UAL | SCHEDULED_AIR_TRANSPORT | True |
UAL2477.MEM.EWR.220712.1915.0144.TFM | JET | A319 | UAL | SCHEDULED_AIR_TRANSPORT | True |
UAL2477.MEM.EWR.220713.1915.0064.TFM | JET | A319 | UAL | SCHEDULED_AIR_TRANSPORT | True |
The airline is indicated as a code at the beginning of the GUFI, e.g., "AAL" is American Airlines and "UAL" is United Airlines. During training, you will simulate an AAL client that trains using its rows of the federated variables (highlighted in blue) plus any of the public data and transmits model updates to a centralized server. A UAL client (private data highlighted in yellow) and clients for the other airlines will do the same. The centralized server decides how to aggregate all of the individual model weights into a single global model. During prediction, each airline will use the final trained model to make pushback predictions for all of the flights it operates. Your federated learning approach will be scored on the same time period and same flights as the test set from Phase 1—the only difference is that the federated variables can only be accessed by the airline that produced them.
In the spirit of experimentation, you will have some flexibility in how you translate your Phase 1 winning model during Phase 2. Do some strategies for combining the models from individual clients work better than others? Are there basic changes that your model needs to operate in a federated setting? We want to know! We will release more of the specific requirements as Phase 2 ramps up, but for now get started thinking about what it will take to turn your winning centralized model into an effective federated model!
About Federated Learning
There are a vast number of public and private organizations that collect and provide flight and airspace related data. However, privacy and intellectual property concerns prevent much of this data from being aggregated, and thus hamper the ability of models and analysts to make the best predictions and decisions.
Federated learning (FL), also known as collaborative learning, is a technique for collaboratively training a shared machine learning model across data from multiple parties while preserving each party's data privacy. Federated learning stands in contrast to the typical centralized machine learning, where the training data needs to be collected and centralized for training. Requiring the parties to share their data compromises the privacy of that data!
In Phase 1, your model does not have to be federated. However, you should choose an architecture with federation in mind. For example, linear models where weights can be aggregated with simple addition are easier to train in a federated way than tree-based models. You can learn more about federated learning on the About page.
Data
This challenge is possible because of the effort that NASA, the FAA, airlines, and other agencies undertake to collect, process, and distribute data to decision makers in near real-time. You will be working with around two years of historical data, but any solution you develop could be translated directly into a pipeline with access to data collected in real time.
10 airports whose data is included in this competition
The data download page contains a tar archive for each of these airports. The structure of each archive is:
├── <airport>
│ ├── <airport>_config.csv.bz2
│ ├── <airport>_etd.csv.bz2
│ ├── <airport>_first_position.csv.bz2
│ ├── <airport>_lamp.csv.bz2
│ ├── <airport>_mfs.csv.bz2
│ ├── <airport>_runways.csv.bz2
│ ├── <airport>_standtimes.csv.bz2
│ ├── <airport>_tbfm.csv.bz2
│ └── <airport>_tfm.csv.bz2
└── train_labels_<airport>.csv.bz2
Read on to learn more about each of these files!
It is up to you to ensure that you are filtering feature data properly during model development in the Open Arena according to the following restrictions:
Time: This is a real-time estimation task. You may only use data that was available from the previous 30 hours up through the time of estimation when generating predictions. For example, if you are predicting pushback for a flight on December 31, 2022 at 12 pm, you can use any data that was available between 6am on 12/30 and 12 pm on 12/31. Most feature data includes a timestamp
column that indicates the time that that observation was made available to use for filtering (more details below).
Each prediction should be treated as an independent observation. That means that you should not use past predictions as input for inference. In the Prescreened code execution environment, you will not be able to keep track of past predictions or features.
Location: You may only use data from the airport from which a flight is departing. The submission format will include an airport
column which you can use to filter. All input data will be partitioned by airport.
These restrictions won't be enforced in the Open Arena, but will be automatically enforced in the Prescreened Arena's code execution harness. Filtering your data properly in the Open Arena will give you a much more accurate sense of how your model will perform in the final evaluation.
Feature data
The feature data for this competition includes information about air traffic and weather conditions.
Air traffic
This competitions uses air traffic data from Fuser, a data processing platform designed by NASA as part of the ATD-2 project. Fuser processes the FAA's raw data stream and distributes cleaned, real-time data on the status of individual flights nationwide.
On the data download page there is a separate tar archive for each airport. Each tar archive contains the files listed below.
Actual departure time and runway code
<airport>_runways.csv
has one row for each flight, and contains the following columns:
departure_runway_actual
: The flight's actual departure runway as a runway code, e.g., 18R, 17C, etc.departed_runway_actual_time
: The time that the flight departed from the runwayarrival_runway_actual
: The flight's actual arrival runway as a runway code, e.g., 18R, 17C, etc.arrival_runway_actual_time
: The time that the flight arrived at the runway
Airport configuration
<airport>_config.csv
describes the active runway configuration at different times. Runway configuration is the combination of runways used for arrivals and departures and the flow direction on those runways. Each row is a different time at the given airport, and the columns are:
timestamp
: The time that the Fuser system received the data (use this for filtering)start_time
: The time the configuration took effectarrival_runways
: The active arrival runway configuration starting fromstart_time
as comma separated runway codes, e.g., 18R, 17C, etc.departure_runway
: The active departure runway configuration starting fromstart_time
as comma separated runway codes, e.g., 18R, 17C, etc.
Estimated departure times
<airport>_etd.csv
may have multiple rows for each flight and contains the following columns:
gufi
: GUFI (Global Unique Flight Identifier)timestamp
: The time that the prediction was generatedestimated_runway_departure_time
: Estimated time that the flight will depart from the runway
Estimated arrival times
TFM (traffic flow management) and TBFM (time-based flow management) are two FAA system that track flights in the NAS. TFM and TBFM forecast the estimated time of arrival (ETA) continuously throughout the duration of a flight.
TFM forecasts are available as <airport>_tfm.csv
and contain the following columns:
gufi
: GUFI (Global Unique Flight Identifier)timestamp
: The time that the prediction was generatedarrival_runway_estimated_time
: Estimated time that the flight will arrive at the runway
TBFM forecasts are available as <airport>_tbfm.csv
and contain the following columns:
gufi
: GUFI (Global Unique Flight Identifier)timestamp
: The time that the prediction was generatedscheduled_runway_estimated_time
: Scheduled time that the flight will arrive at the runway
MFS event times
MFS provides actual event times and flight information during the lifecycle of a flight. MFS data is provided in two files: metadata
and standtimes
.
MFS metadata are available as <airport_mfs>_mfs.csv
and contains critical information about the flight. MFS metadata are different from the other features in that it does not include a timestamp. It is assumed that these flight metadata are available for any flight for which a GUFI exists. However — certain uses of this metadata CSV violate the real-time constraints of the problem.
You may:
- Look up (or "join") metadata for GUFIs that you already know exist within the valid time window from other timestamped features.
- Use MFS data for the current flight that predictions are being generated for.
You may not:
- Use the MFS metadata to look for information about flights for which you do not already have a GUFI from timestamped features.
- Analyze the entire MFS metadata file to, for example, directly incorporate the distribution of aircraft type, carriers, etc. into your solution.
<airport_mfs>_mfs.csv
has one row for each flight, and contains the columns:
gufi
: GUFI (Global Unique Flight Identifier)aircraft_engine_class
: The class of engineaircraft_type
: The type of aircraftmajor_carrier
: The airline carrierflight_type
: The type of flightis_departure
:True
if the flight is a departure, elseFalse
if it is an arrival
<airport_mfs>_standtimes.csv
may have multiple rows for each flight and contains the actual arrival and departure times to/from a gate. For a given flight being predicted, this information is not available at the time of prediction. The columns are:
gufi
: GUFI (Global Unique Flight Identifier)timestamp
: The time that the Fuser system received the dataarrival_stand_actual_time
: The time the flight arrived at the gate at the destination airport.departure_stand_actual_time
: The time the flight departed the gate (the pushback time). At prediction time, this variable is known for other flights that have already push-backed, but is not available for the given flight being predicted
A note on leakage: In the Open Arena, you will have access to the departure_stand_actual_time
(the pushback time) for all flights. It will be easy to leak data into training, which will mean that the performance does not accurately reflect how a model will do in the Prescreened Arena. In the Prescreened Arena, you will be asked to provide a function that can produce predictions for a set of flights at a single airport and timestamp pair. The function will be passed all allowable data, including feature data from the past 30 hours up until the time of prediction. This does not include the actual pushback time for a given flight. Additionally, you will not be allowed to book-keep or save data in between predictions during inference.
First position
<airport>_first_position.csv
contains the following columns:
gufi
: GUFI (Global Unique Flight Identifier)timestamp
: The time that a flight was first tracked by the NAS systems
Note that while first position is tracked for all flights in NAS, an airport's first position dataset only contains flights that are arriving at that airport, i.e., it does not contain first position for flights departing that airport.
Weather
LAMP (Localized Aviation MOS (Model Output Statistics) Program), a weather forecast service operated by the National Weather Service, will be the primary source of weather data. LAMP includes data for each of the airport facilities in the challenge. In addition to the temperature and humidity you'll find in your favorite weather app, LAMP includes quantities that are particularly relevant to aviation, such as visibility, cloud ceiling, and likelihood of lightning.
LAMP includes not only the retrospective weather, but also historical weather predictions – that is, at a point in time in the past, what we thought the weather was going to be. In other words, consider the weather at noon yesterday. In hindsight I know it was sunny (retrospective), but what was my prediction at 9 AM yesterday (historical prediction)? This distinction is critical to making sure our models do not rely on information from the future, but also giving your models access to the best weather predictions at the time they were available. LAMP makes predictions every hour on the half hour, so 00:30, 01:30, 02:30, etc. Each prediction includes a forecast for the next 25 hours.
An extract of LAMP predictions will be available with the following format:
timestamp
: The time that the forecast was generatedforecast_timestamp
: The time for which the forecast is predicting weather conditionstemperature
: Temperature in degree Fahrenheitwind_direction
: Wind direction in compass heading divided by 10 and rounded to the nearest integer (to match runway codes)wind_speed
: Wind speed in knotswind_gust
: Wind gust speed in knotscloud_ceiling
: Cloud ceiling height in feet encoded as category indices1
: <200 feet2
: 200–400 feet3
: 500–900 feet4
: 1,000–1,900 feet5
: 2,000–3,000 feet6
: 3,100–6,500 feet7
: 6,600–12,000 feet8
: >12,000 feet
visibility
: Visibility in miles encoded as category indices1
: <½ mile2
: ½–1 mile3
: 1–2 miles4
: 2–3 miles5
: 3–5 miles6
: 6 miles7
: >6 miles
cloud
: Total sky cover category"BK"
: broken"CL"
: clear"FEW"
: few"OV"
: overcast"SC"
: scattered
lightning_prob
: Probability of lightning"N"
: none"L"
: low"M"
: medium"H"
: high
precip
Boolean indicating whether precipitation is expectedTrue
: precipitation is expectedFalse
: no precipitation expected
Check out the Meterorological Development Lab website and the original LAMP paper for more information.
Labels
Within the tar archive for each airport, labels are provided in train_labels_<airport>.csv
. The target variable is minutes_until_pushback
, or minutes until actual pushback time from time of prediction for a given flight. Predictions are made every 15 minutes starting from roughly an hour before scheduled pushback time until actual pushback time.
Here is an example showing minutes_until_pushback
beginning at 04:00:00 at Chicago O’Hare (KORD). Each individual flight ID or gufi
has labels for multiple timestamps.
gufi | timestamp | airport | minutes_until_pushback |
---|---|---|---|
SKW5143.ORD.EAU.201031.0059.0006.TFM | 2020-11-15 04:00:00 | KORD | 85 |
2020-11-15 04:15:00 | 70 | ||
2020-11-15 04:30:00 | 55 | ||
2020-11-15 04:45:00 | 40 | ||
2020-11-15 05:00:00 | 25 | ||
2020-11-15 05:15:00 | 10 |
Submission format
For the Open Arena, you will be submitting a CSV of predictions for each flight and prediction time, where the index is gufi
, airport
, and timestamp
:
gufi
: GUFI (Global Unique Flight Identifier)airport
: airport codetimestamp
: Prediction time
minutes_until_pushback
should include an integer representing the number of minutes from the time of prediction (timestamp
) until the actual pushback time. To generate a submission, download the submission format from the Data download page and replace the values in the minutes_until_pushback
column with your predictions. The other columns of your submission exactly match the submission format.
Note that the submission format may be either a CSV or zipped CSV. Zipping your submission can dramatically reduce the file size and upload time.
For example, the first few rows of submission_format.csv
are:
gufi,timestamp,airport,minutes_until_pushback
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:15:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:30:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:45:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 20:00:00,KATL,0
And your submission might look like this:
gufi | airport | timestamp | minutes_until_pushback |
---|---|---|---|
AAL1008.ATL.DFW.210607.2033.0110.TFM | 2021-06-08 19:15:00 | KORD | 92 |
2021-06-08 19:30:00 | 75 | ||
2021-06-08 19:45:00 | 62 | ||
2021-06-08 20:00:00 | 44 |
Performance metric
Performance is evaluated according to Mean Absolute Error (MAE), which measures how much the estimated values differ from the observed values. MAE is the mean of the magnitude of the differences between the predicted values and the ground truth. MAE is always non-negative, with lower values indicating a better fit to the data. The competitor that minimizes this metric will top the leaderboard.
$$ MAE = \frac{1}{N} \sum_{i=0}^N |y_i - \hat{y_i}| $$
where
- |$N$| is the number of samples
- |$\hat{y_i}$| is the estimated minutes until pushback for the |$i$|th sample
- |$y_i$| is the actual minutes until pushback of the |$i$|th sample
In this case, each sample is a unique combination of gufi
and timestamp
.
In Python you can calculate MAE using the scikit-learn function sklearn.metrics.mean_absolute_error(y_true, y_pred)
.
Keep in mind that scores displayed on the public Open Arena leaderboard while the competition is running will not be the same as the final scores on the Prescreened Arena leaderboard.
See if you can beat the baseline score on the leaderboard, which simply predicts the number of minutes until 15 minutes before the latest estimated time of departure. This simple baseline works because most flights pushback around 15 minutes before departure, and most flights depart on time.
Good luck
Good luck and enjoy this challenge! Check out the benchmark blog post for tips on how to get started. If you have any questions you can always visit the user forum.