Centralised Code Submission Format
This page documents the submission format for your centralised solution code for Track B: Pandemic Forecasting. Exact submission specifications may be subject to change until submissions open. Each team will be able to make one final submission for evaluation. In addition to local testing and experimentation, teams will also have limited access to test their solutions through the hosted infrastructure later in Phase 2. More details will be provided when submissions open.
Your centralised solution should be a version of your federated solution that runs with full data access and under no privacy threat. The performance of your federated solution relative to the performance of your centralised solution will be part of the evaluation of your overall solution.
The full source code and environment specification for how your code is run is available from the challenge runtime repository.
What to submit
Your submission should be a zip archive named with the extension .zip
(e.g., submission.zip
). The root level of the archive must contain a solution_centralized.py
module that contains the following named functions:
fit
: A function that fits your model on the training data and writes your model to diskpredict
: A function that loads your model and performs inference on the test data
When you make a submission, this will kick off a containerised evaluation job. This job will run a Python main_centralized_train.py
script which will import your fit
function and call it with the appropriate training data access and filesystem access. Then, the job will run a Python main_centralized_test.py
script which will import your predict
function with the appropriate test data access and filesystem access. Please see the following sections for the training and test stages for additional details and API specifications.
Training
Your submitted solution_centralized.py
module must contain a fit
function that matches the below API. This fit
function will be called with paths to the training data, which you can then load and use to train your model. It will also be called with a directory path model_dir
which you should use to save your model. The later test stage will be a separate Python process that does not not share any in-memory scope with training. Therefore, saving to and loading from the model_dir
directory is the only way for the test stage to access your trained model.
def fit(
person_data_path: pathlib.Path,
household_data_path: pathlib.Path,
residence_location_data_path: pathlib.Path,
activity_location_data_path: pathlib.Path,
activity_location_assignment_data_path: pathlib.Path,
population_network_data_path: pathlib.Path,
disease_outcome_data_path: pathlib.Path,
model_dir: pathlib.Path
) -> None:
"""Function that fits your model on the provided training data and saves
your model to disk in the provided directory.
Args:
person_data_path (Path): Path to CSV data file for the Person table.
household_data_path (Path): Path to CSV data file for the House table.
residence_location_data_path (Path): Path to CSV data file for the
Residence Locations table.
activity_location_data_path (Path): Path to CSV data file for the
Activity Locations on table.
activity_location_assignment_data_path (Path): Path to CSV data file
for the Activity Location Assignments table.
population_network_data_path (Path): Path to CSV data file for the
Population Network table.
disease_outcome_data_path (Path): Path to CSV data file for the Disease
Outcome table.
model_dir (Path): Path to a directory that is constant between the train
and test stages. You must use this directory to save and reload
your trained model between the stages.
preds_format_path (Path): Path to CSV file matching the format you must
write your predictions with, filled with dummy values.
preds_dest_path (Path): Destination path that you must write your test
predictions to as a CSV file.
Returns: None
"""
...
For more details on the input data files, please see the data overview page.
Test
Your submitted solution_centralized.py
module must contain a predict
function that matches the below API. This predict
function will be called with paths to the data needed for test-time inference.
- You should not be doing any additional training during the test stage.
- You should load your saved trained model from the
model_dir
directory. - Your
predict
method will be called with apreds_format_path
to a CSV file that provides a template that you should write your predictions to. Your predictions should be written topreds_dest_path
.
def predict(
person_data_path: pathlib.Path,
household_data_path: pathlib.Path,
residence_location_data_path: pathlib.Path,
activity_location_data_path: pathlib.Path,
activity_location_assignment_data_path: pathlib.Path,
population_network_data_path: pathlib.Path,
disease_outcome_data_path: pathlib.Path,
model_dir: pathlib.Path,
preds_format_path: pathlib.Path,
preds_dest_path: pathlib.Path,
) -> None:
"""Function that loads your model from the provided directory and performs
inference on the provided test data. Predictions should match the provided
format and be written to the provided destination path.
Args:
person_data_path (Path): Path to CSV data file for the Person table.
household_data_path (Path): Path to CSV data file for the House table.
residence_location_data_path (Path): Path to CSV data file for the
Residence Locations table.
activity_location_data_path (Path): Path to CSV data file for the
Activity Locations on table.
activity_location_assignment_data_path (Path): Path to CSV data file
for the Activity Location Assignments table.
population_network_data_path (Path): Path to CSV data file for the
Population Network table.
disease_outcome_data_path (Path): Path to CSV data file for the Disease
Outcome table.
model_dir (Path): Path to a directory that is constant between the train
and test stages. You must use this directory to save and reload
your trained model between the stages.
preds_format_path (Path): Path to CSV file matching the format you must
write your predictions with, filled with dummy values.
preds_dest_path (Path): Destination path that you must write your test
predictions to as a CSV file.
Returns: None
"""
...
Predictions Format
Your predict
function should produce a predictions CSV file written to the provided preds_dest_path
file path. Each row should correspond to one individual in the population, identified by the column pid
. Each row should also have a float value in the range [0.0, 1.0] for the column score
which represents a risk score that that individual will become infected during the test period of the synthetic outbreak. A higher score means higher confidence that that individual will become infected. A predictions format CSV is provided via preds_format_path
that matches the rows and columns that your predictions need to have, with dummy values for score
. You can load that file and replace the score values with ones from your model.
pid | score |
---|---|
0 | 0.5 |
1 | 0.5 |
2 | 0.5 |
... |
Data Access and Scope
Your code is called with specific scope and data access. Please note that attempts to circumvent the structure of the setup is grounds for disqualification.
- Your code should not inspect the data or print any data to the logs.
- Your code should not directly access any data files other than what the evaluation runner explicitly provides to each function.
- Your code should not exceed its scope. Directly accessing or modifying any global variables or evaluation runner state is forbidden.
If in doubt about whether something is okay, you may email us or post on the forum.
Runtime specs
Your code is executed within a container that is defined in our runtime repository. The limits are as follows:
- Your submission must be written in Python (Python 3.9.13) and use the packages defined in the runtime repository.
- The submission must complete execution in 6 hours or less.
- The container has access to 6 vCPUs, 56 GB RAM, and 1 GPU with 12 GB of memory.
- The container will not have network access. All necessary files (code and model assets) must be included in your submission.
- The container execution will not have root access to the filesystem.
Requesting additional dependencies
Since the Docker container will not have network access, all dependency packages must be pre-installed in the container image. We are happy to consider additional packages as long as they are approved by the challenge organisers, do not conflict with each other, and can build successfully. Packages must be available through conda for Python 3.9.13. To request an additional package be added to the docker image, follow the instructions in the runtime repository README.
Note: Since package installations need to be approved, be sure to submit any PRs requesting installation by January 11, 2023 to ensure they are incorporated in time for you to make a successful submission.
Optional: install.sh Script
Added January 11, 2023
A bash script named install.sh
can be optionally included as part of your submission. If included, this will be sourced before the evaluation process. This is intended to give you flexibility in setting up the runtime environment, such as performing installation steps with dependencies that are bundled with your submission, or setting necessary environment variables. If you use the install.sh
script, your code guide should document and justify exactly what is it is doing. You should not be executing any code that represents a meaningful part your solution's logic. Abuse of this script for unintended purposes is grounds for disqualification. If you have any questions, please ask on the community forum.
Happy building! Once again, if you have any questions or issues you can always head over to the user forum!