Differential Privacy Temporal Map Challenge: Sprint 3 (Prescreened Arena)

CALLING PRESCREENED PARTICIPANTS! Help public safety agencies share data while protecting privacy. If you haven't been prescreened yet, head on over to the Open Arena to learn more and get started. #privacy

$79,000 in prizes
Completed jun 2021
22 joined

Differential Privacy Temporal Map Challenge


The goal of this challenge is to develop algorithms that preserve data utility as much as possible while guaranteeing individual privacy is protected. The challenge features a series of coding sprints to apply differential privacy methods to temporal map data, where each record is tied to a location and each individual may contribute to a sequence of events.

Why

Large data sets containing personally identifiable information (PII) are exceptionally valuable resources for research and policy analysis in a host of fields supporting America's First Responders such as emergency planning and epidemiology.

Temporal map data is of particular interest to the public safety community in applications such as optimizing response time and personnel placement, natural disaster response, epidemic tracking, demographic data and civic planning. Yet, the ability to track a person's location over a period of time presents particularly serious privacy concerns.

The Solution

Sprint 3 featured records of millions of taxi trips in Chicago. The data included 78 map segments (the community areas where taxis departed and arrived) and 21 time segments (morning, afternoon, and night shifts for each day of the week), along with information on trip time, distance, location, payment, and service provider. An individual taxi could make up to 200 trips in a week.

To succeed in this competition, participants needed to build de-identification algorithms for generating synthetic, privatized taxi records that most accurately preserved the patterns in the original data. Because accuracy is evaluated with respect to all map segments (community areas) and time segments (weekly shifts), it was important that solutions provide good performance even on the more complex or sparse sections of the data.

The Results

When the NIST synthetic data challenges started in 2018, there was skepticism as to whether synthetic data was feasible with differential privacy. Over the course of the 6 sprints across two challenges, we have seen the competitors rise to the occasion, discover unexpected powerful tricks and techniques, and address problems such as large, complex feature spaces, sparse data, high sensitivity queries (temporal data), heterogeneous map segments, small epsilon values, edit constraints and all the complexities of real world data.

They were able not only to perform well on these real world problems. Top contestants of this final sprint also demonstrated algorithms that produce records with both more privacy and greater accuracy than the typical subsampling techniques used by many government agencies to release records. These results hold immediate promise for public safety communities and other groups interested in data sharing with the formal privacy guarantees offered by differential privacy.

Check out the results announcement below and learn more about the winners and their leaderboard-topping approaches!