Navigation

Quick access

Problem description

In this challenge, you will predict the pushback time of departing flights (how long until the plane departs from the gate) using features that capture air traffic and weather conditions. Your goal is to build a model that predicts the minutes until pushback time. It is useful to predict pushback time at various points before a plane’s scheduled departure, so your model will make predictions from roughly an hour before scheduled pushback until the actual pushback time. For each flight, you'll generate predictions based on the information available at a variety of different times leading up to the actual pushback. Each prediction is a unique combination of flight ID, airport, and time (here we'll call this prediction time).

Finalists from Phase 1 will be required to participate in Phase 2, during which they will work with NASA to train a federated version of their model. It is easier to combine weights for certain types of models, and therefore some types of models are more easily federated than others. Keep this in mind as you develop your solutions!

Timeline
Phase 1: Model Development
Phase 2: Federated Learning

Data
Features
Labels

Submission Format
Format
Example

Performance metric
Mean Absolute Error

Timeline and leaderboard

All participants can enter the Open Arena of this challenge (you are here!). In the Open Arena, participants can work on their solutions and get live feedback from a public leaderboard. Participants who attest to their eligibility can enter the Prescreened Arena where participants will submit executable code submissions that determine the final rankings.

Phase 1: Open model development
Phase 1: Code execution (Prescreened only)
Phase 2: Federated learning (Phase 1 finalists only)

Phase 1: Open model development	Phase 1: Code execution (Prescreened only)	Phase 2: Federated learning (Phase 1 finalists Only)
Feb 1 - Apr 17, 2023	Feb 22 - Apr 17, 2023	Jun 1 - Aug 31, 2023
Submit pushback predictions for the validation set to the Open Arena. Scores are displayed on the Open Arena public leaderboard.	Submit code to the Prescreened Arena, which we will execute to compute pushback predictions for the test set. These scores are displayed on the Prescreened public leaderboard and will be used to determine prize rankings.	Finalists from Phase 1 will train a federated version of their model using NASA's federated learning platform.

Phase 1: Model development

Upon registering for the contest, participants will enter the Open Arena (you are here!). The Open Arena provides access to a training data set including 10 airports over approximately two years, as well as feature data for a withheld validation set sampled from the same time range.

Throughout this phase, participants can submit and score predicted pushback times to get feedback on their model’s performance. Once a participant submits predictions according to the Submission Format, those predictions will be compared to the ground truth and the score (mean absolute error) will be shown on the public leaderboard.

Pre-screening: In order to be eligible for the Prescreened Arena, participants must submit an attestation that they are eligible to participate according to the rules. Finalists from the Prescreened Arena will have to submit proof of eligibility.

Once successfully prescreened, participants can:

Enter the Prescreened Arena.
Make executable code submissions to the containerized test harness, which will execute submissions on the Prescreened Arena test data and produce scores for the Prescreened public leaderboard. The Prescreened private leaderboard determines final ranking. More detailed instructions will be provided in the Prescreened Arena.

Solutions in the Prescreened Arena may use both training and validation data from the Open Arena as training data for submissions. The test harness will execute solutions on a disjoint sample with a similar distribution as the Open Arena.

Phase 2: Federated Learning

Finalists from Phase 1 will be required to train a federated version of their models in Phase 2. Finalists will use a model-agnostic federated learning platform to re-train their model. Some features will remain centralized, while others will be partitioned by airline. Successfully federated models will be able to aggregate model weights from an arbitrary number of partitions. Read on for more detail about the competition's federated learning requirements.

The goal of Phase 2 is to explore federated learning techniques that enable independent clients (also known as parties, organizations, groups) to jointly train a global model without sharing or pooling data. In this case, the clients are airlines. Federated learning is a perfect match for the problem of pushback time prediction because airlines collect a lot of information that is relevant to pushback time, but too valuable or sensitive to share, like the number of passengers that have checked in for a flight or the number of bags that have been loaded onto a plane. Federated learning enables airlines to safely contribute the valuable and sensitive data they collect towards a centralized model. Each finalist from Phase 1 will be invited to translate their winning Phase 1 model into a model that can be trained in a federated manner. The data used in Phase 2 will be the the same data as used in Phase 1, but it will be divided into:

"Public" or "non-federated" variables for which all rows are available to all airlines
"Private" or "federated" variables for which only rows corresponding to a flight that an airline operates are available to that airline

The exact variables airlines want to protect by federating is by definition too sensitive to release for a competition. Since we don't have access to the actual airline data that would be federated, we'll simulate the federated scenario by treating a subset of variables as if they are private. All of the federated variables for Phase 2 come from the Phase 1 mfs and standtimes datasets:

Federated mfs variables

aircraft_engine_class
aircraft_type
major_carrier
flight_type

Federated standtimes variables

departure_stand_actual_time

Each airline will have access to its federated variables and all non-federated variables. Airlines will not have access to federated variables from other airlines. In Phase 2, these data constraints apply during training and prediction.

Below is an example of airline-specific federated variables using a sample of the KMEM_mfs.csv.bz2 file:

gufi	aircraft _engine _class	aircraft _type	major _carrier	flight_type	isdeparture
AAL1007.DFW.MEM.211224.0015.0120.TFM	JET	B738	AAL	SCHEDULED_AIR_TRANSPORT	False
AAL1007.DFW.MEM.211231.0015.0066.TFM	JET	B738	AAL	SCHEDULED_AIR_TRANSPORT	False
AAL1010.DFW.STL.210831.0153.0171.TFM	JET	A320	AAL	SCHEDULED_AIR_TRANSPORT	False
AAL1017.DFW.MEM.220205.0035.0087.TFM	JET	A320	AAL	SCHEDULED_AIR_TRANSPORT	False
AAL1017.DFW.MEM.220207.0035.0088.TFM	JET	A320	AAL	SCHEDULED_AIR_TRANSPORT	False
AAL1017.DFW.MEM.220208.0035.0135.TFM	JET	A320	AAL	SCHEDULED_AIR_TRANSPORT	False
UAL2477.MEM.EWR.220710.1915.0167.TFM	JET	A320	UAL	SCHEDULED_AIR_TRANSPORT	True
UAL2477.MEM.EWR.220711.1915.0070.TFM	JET	A319	UAL	SCHEDULED_AIR_TRANSPORT	True
UAL2477.MEM.EWR.220712.1915.0144.TFM	JET	A319	UAL	SCHEDULED_AIR_TRANSPORT	True
UAL2477.MEM.EWR.220713.1915.0064.TFM	JET	A319	UAL	SCHEDULED_AIR_TRANSPORT	True

The airline is indicated as a code at the beginning of the GUFI, e.g., "AAL" is American Airlines and "UAL" is United Airlines. During training, you will simulate an AAL client that trains using its rows of the federated variables (highlighted in blue) plus any of the public data and transmits model updates to a centralized server. A UAL client (private data highlighted in yellow) and clients for the other airlines will do the same. The centralized server decides how to aggregate all of the individual model weights into a single global model. During prediction, each airline will use the final trained model to make pushback predictions for all of the flights it operates. Your federated learning approach will be scored on the same time period and same flights as the test set from Phase 1—the only difference is that the federated variables can only be accessed by the airline that produced them.

In the spirit of experimentation, you will have some flexibility in how you translate your Phase 1 winning model during Phase 2. Do some strategies for combining the models from individual clients work better than others? Are there basic changes that your model needs to operate in a federated setting? We want to know! We will release more of the specific requirements as Phase 2 ramps up, but for now get started thinking about what it will take to turn your winning centralized model into an effective federated model!

About Federated Learning

There are a vast number of public and private organizations that collect and provide flight and airspace related data. However, privacy and intellectual property concerns prevent much of this data from being aggregated, and thus hamper the ability of models and analysts to make the best predictions and decisions.

Federated learning (FL), also known as collaborative learning, is a technique for collaboratively training a shared machine learning model across data from multiple parties while preserving each party's data privacy. Federated learning stands in contrast to the typical centralized machine learning, where the training data needs to be collected and centralized for training. Requiring the parties to share their data compromises the privacy of that data!

In Phase 1, your model does not have to be federated. However, you should choose an architecture with federation in mind. For example, linear models where weights can be aggregated with simple addition are easier to train in a federated way than tree-based models. You can learn more about federated learning on the About page.

Data

This challenge is possible because of the effort that NASA, the FAA, airlines, and other agencies undertake to collect, process, and distribute data to decision makers in near real-time. You will be working with around two years of historical data, but any solution you develop could be translated directly into a pipeline with access to data collected in real time.

Location of airports

10 airports whose data is included in this competition

The data download page contains a tar archive for each of these airports. The structure of each archive is:

├── <airport>
│   ├── <airport>_config.csv.bz2
│   ├── <airport>_etd.csv.bz2
│   ├── <airport>_first_position.csv.bz2
│   ├── <airport>_lamp.csv.bz2
│   ├── <airport>_mfs.csv.bz2
│   ├── <airport>_runways.csv.bz2
│   ├── <airport>_standtimes.csv.bz2
│   ├── <airport>_tbfm.csv.bz2
│   └── <airport>_tfm.csv.bz2
└── train_labels_<airport>.csv.bz2

Read on to learn more about each of these files!

It is up to you to ensure that you are filtering feature data properly during model development in the Open Arena according to the following restrictions:

Time: This is a real-time estimation task. You may only use data that was available from the previous 30 hours up through the time of estimation when generating predictions. For example, if you are predicting pushback for a flight on December 31, 2022 at 12 pm, you can use any data that was available between 6am on 12/30 and 12 pm on 12/31. Most feature data includes a timestamp column that indicates the time that that observation was made available to use for filtering (more details below).

Each prediction should be treated as an independent observation. That means that you should not use past predictions as input for inference. In the Prescreened code execution environment, you will not be able to keep track of past predictions or features.

Location: You may only use data from the airport from which a flight is departing. The submission format will include an airport column which you can use to filter. All input data will be partitioned by airport.

These restrictions won't be enforced in the Open Arena, but will be automatically enforced in the Prescreened Arena's code execution harness. Filtering your data properly in the Open Arena will give you a much more accurate sense of how your model will perform in the final evaluation.

Feature data

The feature data for this competition includes information about air traffic and weather conditions.

Air traffic

This competitions uses air traffic data from Fuser, a data processing platform designed by NASA as part of the ATD-2 project. Fuser processes the FAA's raw data stream and distributes cleaned, real-time data on the status of individual flights nationwide.

On the data download page there is a separate tar archive for each airport. Each tar archive contains the files listed below.

Actual departure time and runway code

<airport>_runways.csv has one row for each flight, and contains the following columns:

departure_runway_actual: The flight's actual departure runway as a runway code, e.g., 18R, 17C, etc.
departed_runway_actual_time: The time that the flight departed from the runway
arrival_runway_actual: The flight's actual arrival runway as a runway code, e.g., 18R, 17C, etc.
arrival_runway_actual_time: The time that the flight arrived at the runway

Airport configuration

<airport>_config.csv describes the active runway configuration at different times. Runway configuration is the combination of runways used for arrivals and departures and the flow direction on those runways. Each row is a different time at the given airport, and the columns are:

timestamp: The time that the Fuser system received the data (use this for filtering)
start_time: The time the configuration took effect
arrival_runways: The active arrival runway configuration starting from start_time as comma separated runway codes, e.g., 18R, 17C, etc.
departure_runway: The active departure runway configuration starting from start_time as comma separated runway codes, e.g., 18R, 17C, etc.

Estimated departure times

<airport>_etd.csv may have multiple rows for each flight and contains the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated
estimated_runway_departure_time: Estimated time that the flight will depart from the runway

Estimated arrival times

TFM (traffic flow management) and TBFM (time-based flow management) are two FAA system that track flights in the NAS. TFM and TBFM forecast the estimated time of arrival (ETA) continuously throughout the duration of a flight.

TFM forecasts are available as <airport>_tfm.csv and contain the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated
arrival_runway_estimated_time: Estimated time that the flight will arrive at the runway

TBFM forecasts are available as <airport>_tbfm.csv and contain the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated
scheduled_runway_estimated_time: Scheduled time that the flight will arrive at the runway

MFS event times

MFS provides actual event times and flight information during the lifecycle of a flight. MFS data is provided in two files: metadata and standtimes.

MFS metadata are available as <airport_mfs>_mfs.csv and contains critical information about the flight. MFS metadata are different from the other features in that it does not include a timestamp. It is assumed that these flight metadata are available for any flight for which a GUFI exists. However — certain uses of this metadata CSV violate the real-time constraints of the problem.

You may:

Look up (or "join") metadata for GUFIs that you already know exist within the valid time window from other timestamped features.
Use MFS data for the current flight that predictions are being generated for.

You may not:

Use the MFS metadata to look for information about flights for which you do not already have a GUFI from timestamped features.
Analyze the entire MFS metadata file to, for example, directly incorporate the distribution of aircraft type, carriers, etc. into your solution.

<airport_mfs>_mfs.csv has one row for each flight, and contains the columns:

gufi: GUFI (Global Unique Flight Identifier)
aircraft_engine_class: The class of engine
aircraft_type: The type of aircraft
major_carrier: The airline carrier
flight_type: The type of flight
is_departure: True if the flight is a departure, else False if it is an arrival

<airport_mfs>_standtimes.csv may have multiple rows for each flight and contains the actual arrival and departure times to/from a gate. For a given flight being predicted, this information is not available at the time of prediction. The columns are:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the Fuser system received the data
arrival_stand_actual_time: The time the flight arrived at the gate at the destination airport.
departure_stand_actual_time: The time the flight departed the gate (the pushback time). At prediction time, this variable is known for other flights that have already push-backed, but is not available for the given flight being predicted

A note on leakage: In the Open Arena, you will have access to the departure_stand_actual_time (the pushback time) for all flights. It will be easy to leak data into training, which will mean that the performance does not accurately reflect how a model will do in the Prescreened Arena. In the Prescreened Arena, you will be asked to provide a function that can produce predictions for a set of flights at a single airport and timestamp pair. The function will be passed all allowable data, including feature data from the past 30 hours up until the time of prediction. This does not include the actual pushback time for a given flight. Additionally, you will not be allowed to book-keep or save data in between predictions during inference.

First position

<airport>_first_position.csv contains the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that a flight was first tracked by the NAS systems

Note that while first position is tracked for all flights in NAS, an airport's first position dataset only contains flights that are arriving at that airport, i.e., it does not contain first position for flights departing that airport.

Weather

LAMP (Localized Aviation MOS (Model Output Statistics) Program), a weather forecast service operated by the National Weather Service, will be the primary source of weather data. LAMP includes data for each of the airport facilities in the challenge. In addition to the temperature and humidity you'll find in your favorite weather app, LAMP includes quantities that are particularly relevant to aviation, such as visibility, cloud ceiling, and likelihood of lightning.

LAMP includes not only the retrospective weather, but also historical weather predictions – that is, at a point in time in the past, what we thought the weather was going to be. In other words, consider the weather at noon yesterday. In hindsight I know it was sunny (retrospective), but what was my prediction at 9 AM yesterday (historical prediction)? This distinction is critical to making sure our models do not rely on information from the future, but also giving your models access to the best weather predictions at the time they were available. LAMP makes predictions every hour on the half hour, so 00:30, 01:30, 02:30, etc. Each prediction includes a forecast for the next 25 hours.

An extract of LAMP predictions will be available with the following format:

timestamp: The time that the forecast was generated
forecast_timestamp: The time for which the forecast is predicting weather conditions
temperature: Temperature in degree Fahrenheit
wind_direction: Wind direction in compass heading divided by 10 and rounded to the nearest integer (to match runway codes)
wind_speed: Wind speed in knots
wind_gust: Wind gust speed in knots
cloud_ceiling: Cloud ceiling height in feet encoded as category indices
- 1: <200 feet
- 2: 200–400 feet
- 3: 500–900 feet
- 4: 1,000–1,900 feet
- 5: 2,000–3,000 feet
- 6: 3,100–6,500 feet
- 7: 6,600–12,000 feet
- 8: >12,000 feet
visibility: Visibility in miles encoded as category indices
- 1: <½ mile
- 2: ½–1 mile
- 3: 1–2 miles
- 4: 2–3 miles
- 5: 3–5 miles
- 6: 6 miles
- 7: >6 miles
cloud: Total sky cover category
- "BK": broken
- "CL": clear
- "FEW": few
- "OV": overcast
- "SC": scattered
lightning_prob: Probability of lightning
- "N": none
- "L": low
- "M": medium
- "H": high
precip Boolean indicating whether precipitation is expected
- True: precipitation is expected
- False: no precipitation expected

Check out the Meterorological Development Lab website and the original LAMP paper for more information.

Labels

Within the tar archive for each airport, labels are provided in train_labels_<airport>.csv. The target variable is minutes_until_pushback, or minutes until actual pushback time from time of prediction for a given flight. Predictions are made every 15 minutes starting from roughly an hour before scheduled pushback time until actual pushback time.

Here is an example showing minutes_until_pushback beginning at 04:00:00 at Chicago O’Hare (KORD). Each individual flight ID or gufi has labels for multiple timestamps.

gufi	timestamp	airport	minutes_until_pushback
SKW5143.ORD.EAU.201031.0059.0006.TFM	2020-11-15 04:00:00	KORD	85
	2020-11-15 04:15:00		70
	2020-11-15 04:30:00		55
	2020-11-15 04:45:00		40
	2020-11-15 05:00:00		25
	2020-11-15 05:15:00		10

Submission format

For the Open Arena, you will be submitting a CSV of predictions for each flight and prediction time, where the index is gufi, airport, and timestamp:

gufi: GUFI (Global Unique Flight Identifier)
airport: airport code
timestamp: Prediction time

minutes_until_pushback should include an integer representing the number of minutes from the time of prediction (timestamp) until the actual pushback time. To generate a submission, download the submission format from the Data download page and replace the values in the minutes_until_pushback column with your predictions. The other columns of your submission exactly match the submission format.

Note that the submission format may be either a CSV or zipped CSV. Zipping your submission can dramatically reduce the file size and upload time.

For example, the first few rows of submission_format.csv are:

gufi,timestamp,airport,minutes_until_pushback
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:15:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:30:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 19:45:00,KATL,0
AAL1008.ATL.DFW.210607.2033.0110.TFM,2021-06-08 20:00:00,KATL,0

And your submission might look like this:

gufi	airport	timestamp	minutes_until_pushback
AAL1008.ATL.DFW.210607.2033.0110.TFM	2021-06-08 19:15:00	KORD	92
	2021-06-08 19:30:00		75
	2021-06-08 19:45:00		62
	2021-06-08 20:00:00		44

Performance metric

Performance is evaluated according to Mean Absolute Error (MAE), which measures how much the estimated values differ from the observed values. MAE is the mean of the magnitude of the differences between the predicted values and the ground truth. MAE is always non-negative, with lower values indicating a better fit to the data. The competitor that minimizes this metric will top the leaderboard.

$$ MAE = \frac{1}{N} \sum_{i=0}^N |y_i - \hat{y_i}| $$

where

|$N$| is the number of samples
|$\hat{y_i}$| is the estimated minutes until pushback for the |$i$|th sample
|$y_i$| is the actual minutes until pushback of the |$i$|th sample

In this case, each sample is a unique combination of gufi and timestamp.

In Python you can calculate MAE using the scikit-learn function sklearn.metrics.mean_absolute_error(y_true, y_pred).

Keep in mind that scores displayed on the public Open Arena leaderboard while the competition is running will not be the same as the final scores on the Prescreened Arena leaderboard.

See if you can beat the baseline score on the leaderboard, which simply predicts the number of minutes until 15 minutes before the latest estimated time of departure. This simple baseline works because most flights pushback around 15 minutes before departure, and most flights depart on time.

Good luck

Good luck and enjoy this challenge! Check out the benchmark blog post for tips on how to get started. If you have any questions you can always visit the user forum.

Pushback to the Future: Predict Pushback Time at US Airports (Open Arena)

Quick Facts

Participants

No. of Entries

Prize

Winner

SKS_cube