Navigation

Quick access

Problem description

In this challenge, you will predict how an airport will be configured in the future using features that capture air traffic and weather conditions. Your goal is to build a model that predicts the probability of possible configurations, looking ahead every 30 minutes for the next 6 hours.

The solution should perform well across different airports, each with its own set of runways and airport configurations. How you accomplish that is up to you; you might train one model per airport, or find a way to combine information across all airports.

Timeline and leaderboard
Model development
Final scoring

Data
Features
Additional sources
Labels

Submission Format
Format
Example

Performance metric
Log loss

Timeline and leaderboard

This challenge will be open to submissions during the model development period, when participants can work on their solutions, get live feedback from a public leaderboard, test executable code submissions, and make a single final scoring submission. Following this period, final scoring will be run on an out-of-sample test set that will be collected after submissions close and used to determine prize rankings.

Open model development
Executable code testing (Prescreened only)
Final scoring (Prescreened only)

Open model development	Executable code testing (Prescreened only)	Final scoring (Prescreened only)
Jan 25 - Apr 25, 2022	Jan 25 - Apr 25, 2022	Summer 2022
Scores run on submitted configuration predictions are calculated from the ground truth for the validation set and displayed on the public leaderboard.	Scores run on executable code submissions are calculated from the Prescreened Arena ground truth test set and displayed on the Prescreened public leaderboard.	Final code submissions from Prescreened participants are run on a new test set collected after submissions close. Final validated scores will be updated on the Prescreened leaderboard and used to determine prize rankings.

Model development

Upon registering for the contest, participants will enter the Open Arena. You will have access to a training data set including 10 airports over approximately one year. You will generate predictions for a withheld validation set sampled from the same time range.

Throughout this phase, participants can submit predicted airport configurations according to the Submission Format described below. The scoring metric will compare predictions against ground truth to produce a score, which will be used to update the public leaderboard.

Pre-screening: In order to be eligible for final scoring, prize-eligible participants must submit proof of eligibility and make a successful submission in the Prescreened Arena. See the prescreening template on the Data Download page for more information.

Once successfully Prescreened, participants will have the ability to:

Enter the Prescreened Arena (you are here!)
Make executable code submissions to the containerized test harness, which will execute submissions on the Prescreened Arena test data and produce scores for the Prescreened public leaderboard
Make a submission for Final Scoring, which will be used to determine final rankings

Solutions in the Prescreened Arena may use both training and validation data from the Open Arena as training data for submissions. The test harness will execute solutions on a short, disjoint sample taken after the time range from the Open Arena. Keep in mind that the Prescreened Arena test set is primarily intended to help participants get their code submissions running successfully for final scoring, and should not be considered a representative sample for model evaluation.

Final scoring

Final prize rankings will be determined by the results of the final scoring phase. All Prescreened participants will have the opportunity to make a single final code submission that will be run on a separate data set to be collected after submissions close. The last successfully-executed submission made in the Prescreened Arena will be used as the final scoring submission for each team.

The final scoring dataset will collected from among the same set of airports used in model development. The time range for final scoring will be approximately one month in May-June 2022, with at least two days of features available for solutions to use prior to the time range used for evaluation.

The leaderboard in the Prescreened Arena will be updated with scores on the final scoring dataset once evaluation is complete. An announcement will be added to the competition when final scores are posted.

Airport configuration data timeline

Illustration of the time ranges for the four data subsets used in the challenge.

Data

This challenge is possible because of the effort that NASA, the FAA, airlines, and other agencies undertake to collect, process, and distribute data to decision makers in near real-time. You will be working with around a year of historical data, but any solution you develop using this data could be translated directly into a pipeline with access to these same features in real-time.

Location of airports

Locations of the 10 airports whose data is included in this competition.

A note on time: Keep in mind, this is a real-time estimation task. You may only use data that was available up through the time of estimation when generating predictions. All feature tables include a timestamp column that indicates the time that that observation is available. During inference, it is strictly prohibited to use of any future data beyond the timestamp when each prediction is generated.

Approved features (inputs)

All data sources and access locations detailed below are pre-approved for both model training and inference. These data have undergone a careful selection process. If you are interested in using additional data sources or access locations, see the process for requesting additional data sources below.

Air traffic

Fuser, a data processing platform designed by NASA as part of the ATD-2 project, will be the primary source of air traffic data. Fuser processes the FAA's raw data stream and distributes cleaned, real-time data on the status of individual flights nationwide. An extract of Fuser data will be available as several CSVs.

Actual departure time and runway

A file <airport>_departure_runway.csv for each airport containing the actual departure time and runway with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the flight departed from the runway.
departure_runway: The flight's actual departure runway as a runway code, e.g., 18R, 17C, etc.

Actual arrival time and runway

A file <airport>_arrival_runway.csv for each airport containing the actual arrival time and runway with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the flight arrived at the runway.
arrival_runway: The flight's actual arrival runway as a runway code, e.g., 18R, 17C, etc.

Estimated departure times

A file <airport>_etd.csv for each airport containing the estimated time of departure with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated.
estimated_runway_departure_time: Estimated time that the flight will depart from the runway.

Estimated arrival times

TFM (traffic flow management) and TBFM (time-based flow management) are two FAA system that track flights in the NAS. One of the many roles they serve is to forecast the estimated time of arrival (ETA) continuously throughout the duration of a flight.

TFM forecasts are available as <airport>_tfm_estimated_runway_time.csv for each airport with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated.
arrival_runway_estimated_time: Estimated time that the flight will arrive at the runway.

TBFM forecasts are available as <airport>_tbfm_scheduled_runway_arrival_time.csv for each airport with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that the prediction was generated.
scheduled_runway_arrival_time: Scheduled time that the flight will arrive at the runway.

MFS event times

MFS provides a source for actual event times in the lifecycle of a flight. Each of these event types are stored in a separate CSV:

<airport>_mfs_stand_departure_time.csv: gufi and timestamp for reaching the departure stand ("stand" is another term for gate).
<airport>_mfs_runway_departure_time.csv: gufi and timestamp for reaching the departure runway.
<airport>_mfs_runway_arrival_time.csv: gufi and timestamp for reaching the arrival runway.
<airport>_mfs_stand_arrival_time.csv: gufi and timestamp for reaching the arrival stand ("stand" is another term for gate).

Note that there may be differences in the runway times reported in the MFS data and the <airport>_arrival/departure_runway.csv data.

First position

A file <airport>_first_position.csv for each airport with the following columns:

gufi: GUFI (Global Unique Flight Identifier)
timestamp: The time that a flight was first tracked by NAS systems.

Weather

LAMP (Localized Aviation MOS (Model Output Statistics) Program), a weather forecast service operated by the National Weather Service, will be the primary source of weather data. It includes data for each of the airport facilities in the challenge. In addition to the temperature and humidity you'll find in your favorite weather app, LAMP includes quantities that are particularly relevant to aviation, such as visibility, cloud ceiling, and likelihood of lightning.

LAMP includes not only the retrospective weather, but also historical weather predictions, that is, at a point in time in the past, what we thought the weather was going to be. In other words, consider the weather at noon yesterday. In hindsight I know it was sunny (retrospective), but what was my prediction at 9 AM yesterday (historical prediction)? This distinction is critical to making sure our models do not rely on information from the future, but also giving your models access to the best weather predictions at the time they were available. LAMP makes predictions every hour on the half hour, so 00:30, 01:30, 02:30, etc. Each prediction includes a forecast the next 25 hours.

An extract of LAMP predictions will be available with the following format:

timestamp: The time that the forecast was generated.
forecast_timestamp: The time for which the forecast is predicting weather conditions.
temperature: Temperature in degree Fahrenheit.
wind_direction: Wind direction in compass heading divided by 10 and rounded to the nearest integer (to match runway codes).
wind_speed: Wind speed in knots.
wind_gust: Wind gust speed in knots.
cloud_ceiling: Cloud ceiling height in feet encoded as category indices. 1: <200 feet, 2: 200–400 feet, 3: 500–900 feet, 4: 1,000–1,900 feet, 5: 2,000–3,000 feet, 6: 3,100–6,500 feet, 7: 6,600–12,000 feet, 8: >12,000 feet.
visibility: Visibility in miles encoded as category indices. 1: <½ mile, 2: ½–1 mile, 3: 1–2 miles, 4: 2–3 miles, 5: 3–5 miles, 5: 6 miles, 7: >6 miles.
cloud: Total sky cover category. "BK": broken, "CL": clear, "FEW": few, "OV": overcast, "SC": scattered
lightning_prob: Probability of lightning categorized as "N": none, "L": low, "M": medium, "H": high.
precip: Boolean indicating whether precipitation is expected. True: precipitation is expected, False: no precipitation expected.

For more information about LAMP, check out the About page and LAMP paper.

Past and current airport configuration

Although your model will output predictions for future airport configurations, past and current airport configurations are valid input features.

To understand airport configuration, we first need to learn how runways are named. We tend to think of a runway as a physical strip of asphalt, but in practice runways are identified by codes that communicate the physical strip plus the direction that traffic flows on that strip (an important distinction, as you might imagine). The direction is expressed as the flow direction compass heading divided by 10, rounded to the nearest integer, and padded to two digits, so between 01 and 36. Each physical runway then is assigned two runway codes for flow directions rotated 180° from each other, for example 18 and 36. In airports where two runways share the same heading, L (left) and R (right) suffixes are used to disambiguate the runways, e.g., 18L/36L, 18R/36R. If there is a third, the C (center) suffix is used. In airports where four or more runways share the same heading, one runway is assigned a code using the actual heading, and subsequent runways are assigned a code using the actual heading decremented by one (two, three, and so on) so that all runways have unique codes.

With these runway codes defined, the entire airport configuration can be expressed as a string that combines the active departure and arrival runways. The string starts with "D_" for departure followed by the codes for all runways active for departures separated by an underscore, followed by "_A_" for arrival followed by the codes for all runways active for arrival separated by an underscore. For example, the airport configuration code "D_01L_18L_A_01R_18R" indicates that runways 01L and 18L are being used for departing flights and runways 01R and 18R are being used for arriving flights.

Airport configuration timing for the entire training period is provided in a file <airport>_airport_configuration.csv for each airport with the following columns:

timestamp: Start time of the configuration, in the format 2021-12-31 23:59:59.
airport_config: The airport configuration as a string encoding the active departure and arrival runways.

Additional datasets

The official competition dataset includes important predictors of airport configuration, but there may be other datasets that provide useful information. Additional data sources may be explored and incorporated during model training. However, only pre-approved data sources are allowed for generating predictions during evaluation.

If you would like for any additional sources to be approved for use during inference, you are welcome to submit an official request form and the challenge organizers will review the request. Only select sources that demonstrate a strong case for use will be considered.

To be considered for approval, data sources must meet the following minimum requirements:

Free and publicly available.
Produced with sufficent stability and low latency to be used in a real-time predictive application
Has a clear way to avoid leaking future information, for instance a timestamp indicating when the data would be available for use in each prediction
Provides clear value beyond existing approved sources

Any requests to add approved data sources must be received by March 15 to be considered. Approved data sources will be added to the competition data and will be made available for all participants.

Sherlock OpenData Warehouse

One potential source for flight-related data is the Sherlock OpenData Warehouse, which provides access to a variety of NAS data in aggregated forms. This includes airport and runway usage over time, flight traffic aggregated by sector and center, TRACON reports, and more. While some of this data is similar to what will be available from Fuser, it is processed in a different way, and uses some different sources. Note that data in Sherlock has undergone less scrutiny than the Fuser data and may have some gaps or quality issues. When using this data, it may be important to validate beforehand that the desired data is available. Access to Sherlock requires a free NASA guest account; see "Sherlock access instructions" on Data Download page for instructions on how to sign up for one. Any requests for Sherlock data to be approved must indicate the specific dataset to be requested.

Of course, Sherlock isn't the only option; part of the potential for innovation comes from discovering those datasets and how best to incorporate them. We encourage you to be creative!

Labels

The target variable is the actual airport configuration for each airport every 30 minutes up to 6 hours into the future. Only one configuration is active at a time. Airports do not use all configurations equally―some are used very often and others not at all. Any configuration that is active less than 30 hours in the entire dataset is collapsed into an "other" category.

Predictions are made every hour. Here is an example showing predictions at 03:00:00 and 04:00:00 at KATL. You can see that each timestamp includes predictions for multiple times in the future, or "lookaheads".

		config	D_10_8L_A_10_8L	D_10_8R_9L_A_10_8L_9R	D_10_8R_A_10_8R	D_26L_27L_A_26R_27L_28	other
airport	timestamp	lookahead
katl	2020-11-15 03:00:00	30	1.0	0.0	0.0	0.0	0.0
		60	0.0	1.0	0.0	0.0	0.0
		90	0.0	0.0	1.0	0.0	0.0
		...
		360	0.0	0.0	0.0	1.0	0.0
	2020-11-15 04:00:00	30	0.0	0.0	1.0	0.0	0.0
		60	0.0	1.0	0.0	0.0	0.0
		90	0.0	0.0	0.0	0.0	1.0
		...
		360	0.0	0.0	0.0	1.0	0.0

In the above table, each airport configuration is a column making it easy to see that only one configuration is active at a time. However, to accommodate the fact that different airports have different numbers of columns, the actual labels file uses a "tidy" format, in which configuration columns are stacked as individial rows. Here is the same data example in a tidy tabular format that the labels file uses:

airport	timestamp	lookahead	config	active
katl	2020-11-15 03:00:00	30	katl:D_10_8L_A_10_8L	1.0
			katl:D_10_8R_9L_A_10_8L_9R	0.0
			katl:D_10_8R_A_10_8R	0.0
			katl:D_26L_27L_A_26R_27L_28	0.0
			katl:other	0.0
		60	katl:D_10_8L_A_10_8L	0.0
			katl:D_10_8R_9L_A_10_8L_9R	1.0
			katl:D_10_8R_A_10_8R	0.0
			katl:D_26L_27L_A_26R_27L_28	0.0
			katl:other	0.0
		90	katl:D_10_8L_A_10_8L	0.0
			katl:D_10_8R_9L_A_10_8L_9R	0.0
			katl:D_10_8R_A_10_8R	1.0
			katl:D_26L_27L_A_26R_27L_28	0.0
			katl:other	0.0
		...	...	...
		360	katl:D_10_8L_A_10_8L	0.0
			katl:D_10_8R_9L_A_10_8L_9R	0.0
			katl:D_10_8R_A_10_8R	0.0
			katl:D_26L_27L_A_26R_27L_28	1.0
			katl:other	0.0

Where a value of 1.0 indicates that the configuration was active at the airport lookahead minutes after the timestamp. Note that this format contains redundant information; the active configuration for 9:00 at 2 hour lookahead (11:00) is the same as the configuration for 10:00 at 1 hour lookahead (also 11:00). Each configuration code includes an airport prefix, e.g., katl:D_10_8L_A_10_8L refers to the configuration D_10_8L_A_10_8L at KATL to distinguish it from the same configuration code at another airport.

Submission format

The Prescreened Arena is a code execution arena! Rather than submitting a CSV of predictions, participants will package everything needed to perform inference and submit for containerized execution on the cloud. Check out the code submission format page for complete details.

You may submit as many times as the stated submission limit allows. The last successfully-executed submission made in the Prescreened Arena will be used as the final scoring submission for each team. This is the submission that will be run on the final scoring dataset to determine prize rankings.

Your code submission will run on the provided test data and must output a tidy CSV as described in the Open Arena. The output CSV will contain predictions for each airport, timestamp, airport configuration, and lookahead in the dataset. Predictions must be valid probabilities to be prize-eligible; that is, predictions for airport configurations at a single timestamp, airport, and lookahead time must sum to 1.

The following is an example of output predictions for three lookahead times at a single airport and timestamp. Note that the predictions for the 5 configurations at the 30-minute lookahead (0.313010, 0.216761, 0.089255, 0.206822, and 0.174152) sum to 1.

airport	timestamp	lookahead	config	active
katl	2020-11-15 03:00:00	30	katl:D_10_8L_A_10_8L	0.313010
			katl:D_10_8R_9L_A_10_8L_9R	0.216761
			katl:D_10_8R_A_10_8R	0.089255
			katl:D_26L_27L_A_26R_27L_28	0.206822
			katl:other	0.174152
		60	katl:D_10_8L_A_10_8L	0.400173
			katl:D_10_8R_9L_A_10_8L_9R	0.139793
			katl:D_10_8R_A_10_8R	0.061403
			katl:D_26L_27L_A_26R_27L_28	0.018932
			katl:other	0.379700
		90	katl:D_10_8L_A_10_8L	0.468433
			katl:D_10_8R_9L_A_10_8L_9R	0.099560
			katl:D_10_8R_A_10_8R	0.235323
			katl:D_26L_27L_A_26R_27L_28	0.127383
			katl:other	0.069301
		...	...	...

Performance metric

To measure your model's accuracy by looking at prediction error, we'll use a metric called log loss. This is an error metric, so a lower value is better (as opposed to an accuracy metric, where a higher value is better). Log loss for a single observation is calculated as follows:

$$L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))$$

where |$y$| is a binary variable indicating whether the airport configuration was active (1) or not (0), and |$p$| is the user-predicted probability that the configuration is active. To equally weight airports with different numbers of configurations, we will compute the log loss for each airport individually, then take the mean of the airport-specific losses.

Log loss is undefined for samples where the label equals 0 and prediction equals 1 (also when the label equals 1 and prediction equals 0). To prevent undefined losses, predictions are clipped to be in the range 1e-16 to (1 - 1e-16) prior to scoring. Even with that constraint, extremely confident wrong predictions can lead to astronomically high losses; you will want to experiment with the range of probabilities your model assigns.

Good luck!

Good luck and enjoy this challenge! Check out the benchmark blog post for tips on how to get started. If you have any questions you can always visit the user forum.

Run-way Functions: Predict Reconfigurations at US Airports (Prescreened Arena)

Quick Facts

Participants

No. of Entries

Prize

Winner

Stuytown2