U.S. PETs Prize Challenge: Phase 2 (Financial Crime–Centralized)

Help unlock the potential of privacy-enhancing technologies (PETs) to combat global societal challenges. Develop efficient, accurate, and extensible federated learning solutions with strong privacy guarantees for individuals in the data. #privacy

$185,000 in prizes
mar 2023
201 joined

Problem Description


The objective of the challenge is to develop a privacy-preserving federated learning solution that is capable of training an effective model while providing a demonstrable level of privacy against a broad range of privacy threats. The challenge organizers are interested in efficient and usable federated learning solutions that provide end-to-end privacy and security protections while harnessing the potential of AI for overcoming significant global challenges.

Solutions will tackle one or both of two tasks: financial crime prevention or pandemic forecasting. Teams will be required to submit both centralized and federated versions of their models.

This is the Financial Crime Track for Phase 2. The Pandemic Forecasting Track can be found here.

In Phase 2 of the challenge, you will develop a working prototype of the privacy-preserving federated learning solution that you proposed in Phase 1. As part of this phase, you will package your solution for containerized execution in a common environment provided for the testing, evaluation and benchmarking of solutions.

Overview of what teams submit


You will need to submit the following for your solution by the end of Phase 2:

  1. Executable federated solution code—working implementation of your federated learning solution that is run via containerized execution.
  2. Executable centralized solution code—working implementation of a centralized version of your solution that is run via containerized execution.
  3. Documentation zip archive—a zip archive that contains the following documentation items:
    • Technical report—updated paper that refines and expands on your Phase 1 concept paper, including your own experimental privacy, accuracy, and efficiency metrics for the two models.
    • Code guide—README files which provide the mapping between components of your solution methodology and the associated components in your submitted source code, to help reviewers identify which parts of your source code perform which parts of your solution.

Each team will be able to make one final submission for evaluation. In addition to local testing and experimentation, teams will also have limited access to test their solutions through the hosted infrastructure later in Phase 2.

If your team is participating in both data tracks (whether with separate solutions or a generalized solution), you are required to submit all items for both tracks. Additional details regarding each of these items are provided below.

Finalists will be required to submit a form to be completed by your team's Official Representative regarding eligibility for challenge prizes. The form will be made available to finalists by email.

Data Tracks

In Phase 2, prizes are awarded in separate categories for top solutions for each of the two Data Tracks as well as for Generalized Solutions. Teams can win prizes across multiple of the three categories, and were required in Phase 1 to declare how their solutions apply to which prize categories.

Teams have chosen to develop a solution for either Track A or Track B, or develop two solutions total—one for each track. A solution in Phase 2 is eligible to be considered for a prize for the respective track's top solutions prize category. Teams with two solutions may win up to two top solution prizes in Phase 2—each of their solutions will be considered for a prize for their respective data tracks.

Alternatively, teams chose to develop a single generalized solution. A generalized solution is one where the same core privacy techniques and proofs apply to both use cases, and adaptations to specific use cases are relatively minor and separable from the shared core. (The specific privacy claims or guarantees as a result of those proofs may differ by use case.) Generalized solutions may win up to three prizes—one from each of the two data track prize categories, and a third from a prize category for top generalized solutions.

Code Execution


Federated Code Execution

As part of the challenge, all solutions will be deployed and evaluated on a technical infrastructure providing a common environment for the testing, evaluation, and benchmarking of solutions. To run on this infrastructure, your solution will need to adapt to the provided API specification.

You will submit a code implementation of your federated learning solution to the challenge platform. The evaluation harness will run your code under simulated federated training and inference on a single node on multiple predefined data partitioning scenarios that are the same for all teams. Runtime and accuracy metrics resulting from the evaluation run will be captured and incorporated as part of the overall evaluation of Phase 2 solutions.

The execution harness for this challenge will simulate the federation in your solution in a single containerized node using the virtual client engine from the Flower federated learning framework, a Python library. The API specification will be based on the Client and Strategy interfaces. Uses of federated learning frameworks other than Flower and programming languages other than Python are allowed; however, you will need to wrap such elements of your solution in Python code that conforms to the API specifications.

What to submit

For details on what you need to submit for federated code execution, see the Federated Code Submission Format page for this track.

Centralized Code Execution

You will also submit code for a centralized version of your solution to the challenge platform. The evaluation harness will run training and inference on a centralized version of the evaluation dataset. Runtime and accuracy metrics resulting from the evaluation run will be captured and incorporated as part of the overall evaluation of Phase 2 solutions.

What to submit

For details on what you need to submit for centralized code execution, see the Centralized Code Submission Format page for this track.

Documentation Submissions


Technical Paper

The technical paper builds on the concept paper from Phase 1. Teams will be expected to:

  • Update threat model, technical approach, and privacy proofs to reflect any refinement to their solution. Changes should be relatively minor and not alter the fundamentals of their solution.
  • Describe your centralized solution, including any justifications as needed for how architecture choices and training parameters make it an appropriate baseline to compare against your federated solution.
  • Add self-reported privacy, accuracy, efficiency, and scalability metrics from local experimentation, including documentation about the experimentation environment.

You will be allowed an additional 4 pages to update your concept paper. Therefore, updated technical papers shall not exceed 14 pages total, not including references.

Successful papers will include the following sections. Please note the additional "Experimental Results" section that was not present in the Phase 1 concept paper specifications.

  1. Title
    The title of your submission, matching the abstract.
  2. Abstract
    A brief description of the proposed privacy mechanisms and federated model.
  3. Background
    The background should clearly articulate the selected track(s) the solution addresses, understanding of the problem, and opportunities for privacy technology within the current state of the art.
  4. Threat Model
    This threat model section should clearly state the threat models considered, and any related assumptions, including:
    • the risks associated with the considered threat models through the design and implementation of technical mitigations in your solution
    • how your solution will mitigate against the defined threat models
    • whether technical innovations introduced in your proposed solution may introduce novel privacy vulnerabilities
    • relevant established privacy and security vulnerabilities and attacks, including any best practice mitigations
  5. Technical Approach
    The approach section should clearly describe the technical approaches used and list any privacy issues specific to the technological approaches. Successful submissions should clearly articulate:
    • the design of any algorithms, protocols, etc. utilized
    • justifications for enhancements or novelties compared to the current state-of-the-art
    • the expected accuracy and performance of the model, including a comparison to the centralized baseline model
    • the expected efficiency and scalability of the privacy solution with respect to number of partitions and dataset size
    • the expected tradeoffs between privacy and accuracy/utility
    • how the explainability of the model might be impacted by the privacy solution
  6. Proof of Privacy
    The proof of privacy section should include formal or informal evidence-based arguments for how the solution will provide privacy guarantees while ensuring high utility. Successful papers will directly address the privacy vs. utility trade-off.
  7. Experimental Results
    The results section should include experimental privacy, accuracy, and efficiency and scalability metrics based on the development dataset. Your experimental results will be used to help determine your scores for the Privacy, Accuracy, and Efficiency/Scalability criteria. We suggest using following metrics to measure performance in these categories:

    • Privacy: privacy parameters (e.g. ε and δ for differential privacy); success rate of membership inference attack
    • Accuracy: area under the precision-recall curve (AUPRC)
    • Efficiency: total execution time; computation time for each party; maximum memory usage for each party; communication cost for each party
    • Scalability: change in execution time and computation/communication as number of partitions increases

    We ask that your experimental results include the following:

    a. Privacy–accuracy tradeoff: Please define three privacy scenarios (strong, moderate, weak) as applicable to your solution, and report accuracy metrics (at a minimum AUPRC) corresponding to these scenarios to demonstrate the tradeoff. Two possible ways of defining the privacy scenarios appear in the table below - one for differentially private solutions, and one that leverages membership inference advantage for solutions without theoretical guarantees. Please define your scenarios clearly and use the strongest possible definitions that apply to your solution (i.e. report privacy parameters for theoretical bounds when possible).

    Differential Privacy Membership Inference
    Strong ε ≈ 0 Adv ≈ 0
    Moderate ε ≈ 1 Adv ≤ 0.1
    Weak ε ≈ 5 Adv ≤ 0.2

    b. Additional results that support your solution with respect to the evaluation criteria.

  8. Data
    The data section should describe how the solution will cater to the types of data provided and articulate what additional work may be needed to generalize the solution to other types of data or models.

  9. Team Introduction
    An introduction to yourself and your team members (if applicable) that briefly details background and expertise. Optionally, you may explain your interest in the problem.
  10. References
    A reference section.

Code Guide

You will be required to create a code guide in the style of a README that documents your code. The code guide should explain all of the components of your code and how they correspond to the conceptual elements of your solution. An effective code guide will provide a mapping between the key parts of your technical paper and the relevant parts of your source code. Please keep in mind that reviewers will need to be able to read and understand your code, so follow code readability best practices as much as you are able to when developing your solution.

Evaluation


Solutions should aim to:

  • Provide robust privacy protection for the collaborating parties
  • Minimize loss of overall accuracy in the model
  • Minimize additional computational resources (including compute, memory, communication), as compared to a non-federated learning approach.

In addition to this, the evaluation process will reward competitors who:

  • Show a high degree of novelty or innovation
  • Demonstrate how their solution (or parts of it) could be applied or generalized to other use cases
  • Effectively prove or demonstrate the privacy guarantees offered by their solution, in a form that is comprehensible to data owners or regulators
  • Consider how their solution could be applied in a production environment

Rubric

Initial evaluation of the developed solutions will be based on a combination of quantitative metrics, and qualitative assessments by judges according to the following criteria:

Topic Factors Weighting (/100)
Privacy Information leakage possible from the PPFL model during training and inference, for a fixed level of model accuracy.
Ability to clearly evidence privacy guarantees offered by solution in a form accessible to a regulator and/or data owner audience
35
Accuracy Absolute accuracy of the PPFL model developed (e.g., F1 score).
Comparative accuracy of PPFL model compared with a centralized model, for a fixed amount of information leakage
20
Efficiency and scalability Time to train PPFL model and comparison with the centralized model.
Network overhead of model training.
Memory (and other temporary storage) overhead of model training.
Ability to demonstrate scalability of the overall approach taken for additional nodes
20
Adaptability Range of different use cases that the solution could potentially be applied to, beyond the scope of the current challenge 5
Usability and Explainability Level of effort to translate the solution into one that could be successfully deployed in a real world environment.
Extent and ease of which privacy parameters can be tuned.
Ability to demonstrate that the solution implementation preserves any explainability of model outputs.
10
Innovation Demonstrated advancement in the state-of-the-art of privacy technology, informed by above-described accuracy, privacy and efficiency factors 10

Phase 2 may include one round of interaction with the teams so that they can provide any clarification sought by the judges. Comparison bins may be created to compare similar solutions. Solutions should make a case for improvements against existing state-of-the-art solutions.

Track-specific prize finalists will be determined based on the factors above for solutions submitted to each data track. Generalizable solutions run on both Tracks will be evaluated by combining the factors above, along with dedicated judging of the generalizability of the solution as described on the Challenge Website as follows:

Topic Factors Weighting (/100)
Performance on Track A See table above 40
Performance on Track B See table above 40
Generalizability Assessment of the technical innovation contributing to generalizability of the solution to the two use cases, and potential for other use cases 20


As with Phase 1, the outcomes of trade-off considerations among criteria as made in the Concept Papers should be reflected in the developed solution. Solutions must meet a minimum threshold of privacy and accuracy, as assessed by judges and measured quantitatively, to be eligible to score points in the remaining criteria.

The top solutions ranked by points awarded for Track A, Track B, and Generalized Solutions will advance to Red Team evaluation as described below. The results of the Red Team evaluation will be used to finalize the scores above in order to determine final rankings.

Accuracy Metrics

The evaluation metric will be Area Under the Precision–Recall Curve (AUPRC), also known as average precision (AP), PR-AUC, or AUCPR. This is a commonly used metric for binary classification that summarizes model performance across all operating thresholds. This metric rewards models which can consistently assign anomalous transactions with a higher confidence score than negative non-anomalous transactions.

AUPRC will be evaluated under the following scenarios:

  • Federated Solution with N1...3 partitions (a minimum of three different partitioning schemes)
  • Centralized Solution

AUPRC is computed as follows:

$$ \text{AUPRC} = \sum_n (R_n - R_{n-1}) P_n $$

where Pn and Rn are the precision and recall, respectively, when thresholding at the nth individual transaction sorted in order of increasing recall.

Computational Metrics

In addition, metrics will be calculated at runtime to empirically assess performance, efficiency, and scalability. These metrics may include, but are not limited to:

  • Total Training Time for Federated Solution with N1...3 partitions
  • Total Training Time for Centralized Solution
  • Peak Training Memory Usage for Federated Solution with N1...3 partitions
  • Peak Training Memory Usage for Centralized Solution
  • Total Training "Network" Disk Volume for Federated Solution with N1...3 partitions
  • Total Training "Network" File Number for Federated Solution with N1...3 partitions

Red Team Evaluation

The objective of Phase 3 is to test the strength of the privacy-preserving techniques of the developed PPFL modes through a series of privacy audits and attacks. Red Team Participants will plan and launch audits and attacks against the highest-scoring solutions developed during Phase 2.

Frequently Asked Questions


What is the role of the common execution runtime?

Phase 2 of the PETs Prize challenge includes the submission of a containerized implementation of participant solutions to a common execution runtime and infrastructure. This runtime provides a common environment and logic for the testing, evaluation, and benchmarking of the solutions. Centralized and federated model submissions will be run on a separate, unseen dataset.

The quantitative results derived from this testing will be one part of the overall evaluation of solutions. Participants will also submit an updated technical paper including their own experimental privacy, accuracy, and efficiency metrics for the two models. Final scores will be determined by a panel of judges. See the evaluation section in each data track for further information.

How is Flower used in Phase 2 code execution? Why is it needed? Why isn't a light-weight Docker container sufficient?

Flower is a customizable federated learning library that is agnostic to specific machine learning frameworks. The federated evaluation harness uses Flower as an API specification and as the execution engine for simulating the federated learning workflow.

Setting a standard API and simulation engine have a few benefits for the objectives of the PETs Prize Challenge:

  • Comparable metrics for consistent evaluation—The standardization of execution allows challenge organizers to instrument the evaluation workflow in order to collect performance metrics and capture client–server communications in a standardized way. This allows for greater comparability between solution implementations.
  • Facilitation of judging and red team evaluation—The standardized API will make it easier for judges and red teams to review and understand source code. Standardized capture of client–server communications will facilitate red teams to evaluate privacy attacks that make use of that information. Additionally, judges can have greater confidence that the federation structure of solutions is properly implemented.
  • Focus on privacy techniques—The objective of the challenge is to drive innovation in privacy technologies. Teams can focus on the design and implementation of their privacy techniques, as the federated learning simulation is handled by a provided standard implementation.

The challenge organizers recognize that there are inherent tradeoffs in how the evaluation harness is designed. This design has been chosen to balance those tradeoffs in achieving the challenge's objectives.

Is this a challenge for Flower-based solutions? What if I have my own federated learning framework?

No, the challenge is for general development of privacy-preserving federated learning solutions.

One part of how solutions are evaluated is having an implementation that is tested, evaluated, and benchmarked in a standardized evaluation runtime. This submitted implementation must follow the standardized API specifications based on Flower. If you have another federated learning framework that you would like to use as part of the submitted implementation, you can wrap your code with the Flower API.

You may have other implementations of your solution that are not submitted for code execution and exclusively make use of your own framework or another framework. If such an implementation demonstrates additional strengths of your solution, you should discuss it and include experimental results as part of your technical paper. Keep in mind that your report should clearly articulate and defend the benefits of your solution. See the evaluation criteria for further information.

The evaluation does not support client peer–to–peer communication. What do I do if my federated solution uses decentralized federated learning?

You can still implement communication between clients by routing them through the server as a mediator. If this has an impact on the communication efficiency or the privacy of your solution, you should clearly explain in your technical paper as part of your definition of your threat model. You should also include any relevant experimental results in your technical paper. Judges and red teams will take this into account when reviewing your solution during final evaluation.

What access do Red Teams have to my solution?

Red Teams will have access to the concept papers submitted by Blue Teams in Phase 1, and will also be provided with submitted solutions (including source code) from Phase 2 for the finalists selected to advance to Red Team testing.

Intellectual property considerations for Blue Team submissions are discussed in the "Submission & Intellectual Property Rights" section of the challenge rules:

As detailed in these rules, Phase 3 of the Challenge includes the disclosure of Blue Team Participants’ Phase 1 and 2 submissions to Red Team Participants, and by submitting an entry to any phase of the Challenge, Blue Team Participants acknowledge and agree to such disclosure. Accordingly, Blue Team Participants may wish to take appropriate measures to protect any intellectual property contained within their submissions, and such protection should be sought prior to the entry of such submissions into the Challenge.

Please review the official challenge rules for a complete discussion of this topic.

All Red Teams are required to sign a non-disclosure agreement as a condition for participation. You can find a copy of the agreement here.

How can I include software dependencies that my solution depends on?

The primary way that software dependencies are available to solutions is for such dependencies to be included as part of the runtime container image. Please see instructions here for instructions on opening a pull request to add additional dependencies to the runtime image.

Vendoring software dependencies by including them as part of your submission is also an available option for dependencies that do not make sense to include as part of building the runtime image. You can learn more about how Python's module search path works from the Python documentation or this guide.

Please note that containers will not have any network access when running your code during the evaluation process.

What should I do if the design of the standardized evaluation has an impact on the performance of my solution that otherwise wouldn't apply in other deployment circumstances?

Please use the technical report to describe any impact that the evaluation process has on your solution that you believe would not be applicable under other deployment circumstances. You should include any relevant results from local experimentation. The judges will consider such claims and justifications as part of the evaluation.

Good luck


Good luck and enjoy this problem! For more details on the code submission format, visit the code submission page. If you have any questions, you can always ask the community by visiting the DrivenData user forum or the cross-U.S.–U.K. public Slack channel. You can request access to the Slack channel here.