Pushback to the Future: Predict Pushback Time at US Airports (Phase 2)

Welcome Phase 1 finalists to Phase 2 of the Predict Pushback Challenge! Now's your chance to explore translating your winning Phase 1 model into a decentralized federated learning model. #civic

jul 2023
13 joined

Problem description

The goal of Phase 2 is to explore federated learning techniques that enable independent clients (also known as parties, organizations, groups) to jointly train a global model without sharing or pooling data. In this case, the clients are airlines. Federated learning is a perfect match for the problem of pushback time prediction because airlines collect a lot of information that is relevant to pushback time, but too valuable or sensitive to share, like the number of passengers that have checked in for a flight or the number of bags that have been loaded onto a plane. Federated learning enables airlines to safely contribute the valuable and sensitive data they collect towards a centralized model.

In Phase 2, you'll translate your winning Phase 1 model into a model that can be trained in a federated manner. You will work with the same data from Phase 1, only it will be divided into "public" (or "non-federated") variables and "private" (or "federated") variables. Public variables are available to all airlines. Private variables for a given flight are only available to the airlines that operated the flight.

In the spirit of experimentation, you will have some flexibility in how you translate your Phase 1 winning model during Phase 2. Do some strategies for combining the models from individual clients work better than others? Are there basic changes that your model needs to operate in a federated setting? We want to know!

Data

In the development period, we will provide training features and labels that are partitioned to show which data is available to all airlines versus only available to specific airlines. The training data includes 10 airports. Each airport's training features and labels are provided in a separate tar archive of bzip2-compressed CSVs. Head to the Data download page to get the data, and read on to learn more about the data format.

Limiting predictions to the most active airlines

The vast majority of flights are operated by a small subset of the total number of airlines present in the full dataset. For simplicity and efficiency, you will only be asked to predict for flights from 25 of the most active airlines. These are: AAL, AJT, ASA, ASH, AWI, DAL, EDV, EJA, ENY, FDX, FFT, GJS, GTI, JBU, JIA, NKS, PDT, QXE, RPA, SKW, SWA, SWQ, TPA, UAL, UPS.

Not all these 25 airlines will have flights out of all airports. If an airline does not operate any flights, the airline-specific private feature file will only contain a single header line. Note that although you will only be asked to make predictions for these airlines, you will have access to public data from all airlines (and of course, none of the private data from airlines that are not among these 25).

Train features

The features used in Phase 2 will be the same data used in Phase 1, with a few key differences:

  • All public data (all airline data, available to all airlines) are stored in a public root directory. Public versions of the mfs and standtimes files include only the public (non-federated) variables, and not the private (federated) variables.
  • All private data are stored in a private root directory. For each airport, each of the 25 selected airlines has mfs and standtimes files containing the public and private columns.

Here is an example of the training features within one airport archive, shown for KATL:

private
└── KATL
    ├── KATL_AAL_mfs.csv.bz2         # rows for AAL airline, all columns
    ├── KATL_AAL_standtimes.csv.bz2  # rows for AAL airline, all columns
    ├── ...
    ├── KATL_UPS_mfs.csv.bz2         # rows for UPS airline, all columns
    └── KATL_UPS_standtimes.csv.bz2  # rows for UPS airline, all columns

public
└── KATL
    ├── KATL_config.csv.bz2
    ├── KATL_etd.csv.bz2
    ├── KATL_first_position.csv.bz2
    ├── KATL_lamp.csv.bz2
    ├── KATL_mfs.csv.bz2         # rows for all airlines, only public columns
    ├── KATL_runways.csv.bz2
    ├── KATL_standtimes.csv.bz2  # rows for all airlines, only public columns
    ├── KATL_tbfm.csv.bz2
    └── KATL_tfm.csv.bz2

The columns and rows contained in each file depend on whether the feature is public or private.

MFS

  • Public files contain the public columns: gufi, isdeparture
  • Private files contain the public columns above in addition to the following private columns: aircraft_engine_class, aircraft_type, major_carrier, flight_type.

Standtimes

  • Public files contain the public columns: gufi, timestamp, departure_stand_actual_time
  • Private files contain the public columns above in addition to the following private column: arrival_stand_actual_time

Public files contain rows for all flights, regardless of airline. Private files are limited to rows for a single airline, e.g, KATL_AAL_mfs.csv.bz2 only includes rows for flights operated by American Airlines (AAL). Refer to the Phase 1 documentation for variable descriptions.

Train labels

The archives also include training labels to assist in the development of your model. You are not required to use these during training, but it could be a useful starting point. The Phase 2 training labels are identical to those provided in the Prescreened Arena, but subset to only the selected airlines. As in Phase 1, each file contains the actual minutes to pushback for a GUFI at a particular time. For more detail, see the label description from Phase 1.

For example, here are the first five rows of phase2_train_labels_KATL.csv.bz2:

gufi,timestamp,airport,minutes_until_pushback
AAL1008.ATL.DFW.210403.1312.0051.TFM_TFDM,2021-04-03 19:30:00,KATL,114
AAL1008.ATL.DFW.210403.1312.0051.TFM_TFDM,2021-04-03 19:45:00,KATL,99
AAL1008.ATL.DFW.210403.1312.0051.TFM_TFDM,2021-04-03 20:00:00,KATL,84
AAL1008.ATL.DFW.210403.1312.0051.TFM_TFDM,2021-04-03 20:15:00,KATL,69
AAL1008.ATL.DFW.210403.1312.0051.TFM_TFDM,2021-04-03 20:30:00,KATL,54

Federated training

As we mentioned, the federated clients in Phase 2 are airlines. Airline is indicated by the first three letters of the GUFI, e.g., a flight with GUFI AAL1007.DFW.MEM.211224.0015.0120.TFM is operated by "AAL" (American Airlines) and a flight with GUFI UAL2477.MEM.EWR.220710.1915.0167.TFM is operated by "UAL" is (United Airlines).

During training, you will simulate an AAL client that trains using its rows of the federated variables plus any of the public data and transmits model updates to a centralized server. A UAL client and clients for the other airlines will do the same. The centralized server can use different methods to weight and aggregate all of the individual model weights into a single global model.

Flower: a federated learning framework

We require you to use the Flower federated learning framework to train your model. Flower's design includes a few abstractions structured to match the general concepts in federated learning, which should help you focus on honing your algorithm without needing to reimplement the basics of federated learning from scratch.

You will be implementing your solution to work with the Client and Strategy APIs. Client classes hold the logic that is executed by the federation units that have access to their private data, while the Server class holds the logic for coordinating among the clients and aggregating results. For the server, Flower additionally separates the federated learning algorithm logic from the networking logic—the algorithm logic is handled by a Strategy class that the server dispatches to. This allows Flower to provide a default Server implementation which handles networking that generally does not need to be customized, while the Strategy can be swapped out or customized to handle different federated learning approaches.

Rather than performing federated learning across multiple machines (as it would be in the real-world), you most likely will want to simulate federated learning on a single machine. For this, you can use Flower's virtual client engine.

For a worked example of federated training and inference on this dataset, check out the simple linear regression model.

Flower has extensive documentation for how to get started:

Evaluating your model

During prediction, each airline will use the final trained model to make pushback predictions for all of the flights it operates. To directly compare your Phase 1 centralized model with your Phase 2 federated model, both will be evaluated on the exact same dataset: the held-out test set from Phase 1—the only difference is that the private variables can only be accessed by the airline that produced them.

You will be responsible for implementing your own evaluation script, which you will document and submit as part of your development submission. For each prediction, you will need to make sure that:

  • You do not use private features from another airline. Be sure to check that the only private features you are using are those from the airline that operated the given flight.

  • You only use data that was available from the previous 30 hours up through the time of estimation, just as in Phase 1. For example, if you are predicting pushback for a flight on December 31, 2022 at 12 pm, you can use any data that was available between 6am on 12/30 and 12 pm on 12/31.

All CSVs except for MFS files include a timestamp column that indicates the time the observation was made, which can be used to filter to the 30 hours before time of estimation.

For the MFS data, which lacks a timestamp, the same rules apply as in Phase 1: You can look up MFS data for the current flight as well as GUFIs that occurred within the 30 hours up to the time of estimation. You may not analyze the entire MFS metadata file to, for example, directly incorporate the distribution of aircraft type or carriers into your solution.

How you achieve this is up to you. The simplest approach may be to evaluate outside of the Flower framework. The simple federated linear regression model includes a prediction script to show one example of how inference might look.

We will verify that your code meets the requirements—clear documentation will speed up the verification process (and is generally appreciated!).

Good luck!

Good luck, and enjoy taking federated learning to the skies! If you have any questions, let us know in the forum!