Differential Privacy Temporal Map Challenge: Sprint 1 (Prescreened Arena)

CALLING PRESCREENED PARTICIPANTS! Help public safety agencies share data while protecting privacy. If you haven't been prescreened yet, head on over to the Open Arena to learn more and get started. #privacy

$29,000 in prizes
jan 2021
26 joined

Differential Privacy Temporal Map Challenge


The goal of this challenge is to develop algorithms that preserve data utility as much as possible while guaranteeing individual privacy is protected. The challenge features a series of coding sprints to apply differential privacy methods to temporal map data, where each record is tied to a location and each individual may contribute to a sequence of events.

Why

Large data sets containing personally identifiable information (PII) are exceptionally valuable resources for research and policy analysis in a host of fields supporting America's First Responders such as emergency planning and epidemiology.

Temporal map data is of particular interest to the public safety community in applications such as optimizing response time and personnel placement, natural disaster response, epidemic tracking, demographic data and civic planning. Yet, the ability to track a person's location over a period of time presents particularly serious privacy concerns.

The Solution

Sprint 1 featured data on 911 calls in Baltimore made over the course of a year. Participants needed to build de-identification algorithms for generating privatized datasets that reported monthly incident counts for each type of incident by neighborhood.

The temporal sequence aspect of the problem is especially challenging because it allows one individual to contribute to many events (up to 20). This increases the sensitivity of the problem and the amount of added noise needed.

The Results

Many techniques from DP literature are not designed to handle high sensitivity. To overcome this, winning competitors tried different creative approaches:

  • Subsampling: Only use a maximum of k records from each person, to reduce sensitivity to k.
  • Preprocessing: Subsampling (only use a maximum of k records from each person, to reduce sensitivity to k), and reducing the data space by eliminating infrequent codes.
  • Post-processing: Clean up noisy data by applying various optimizing, smoothing and denoising strategies (several clever approaches were used, see solution descriptions in the post below).

These solutions were evaluated using a "Pie Chart Evaluation Metric", designed to measure how faithfully each privatization algorithm preserves the most significant patterns in the data within each map/time segment. The first place winner combined several techniques and tailored their algorithm to the level of privacy required to ultimately achieve the greatest utility score from the privatized data.