privacy

Differential Privacy Temporal Map Challenge: Sprint 2 (Open Arena)

START HERE! Help public safety agencies share data while protecting privacy. This is part of a series of contests to develop algorithms that preserve the usefulness of temporal map data while guaranteeing individual privacy is protected. #privacy

feb 2021

76 joined

Navigation

Differential Privacy Temporal Map Challenge

The goal of this challenge is to develop algorithms that preserve data utility as much as possible while guaranteeing individual privacy is protected. The challenge features a series of coding sprints to apply differential privacy methods to temporal map data, where each record is tied to a location and each individual may contribute to a sequence of events.

Why

Large data sets containing personally identifiable information (PII) are exceptionally valuable resources for research and policy analysis in a host of fields supporting America's First Responders such as emergency planning and epidemiology.

Temporal map data is of particular interest to the public safety community in applications such as optimizing response time and personnel placement, natural disaster response, epidemic tracking, demographic data and civic planning. Yet, the ability to track a person's location over a period of time presents particularly serious privacy concerns.

The Solution

Sprint 2 featured featured census data about simulated individuals in various US states from 2012-2018, including 35 different survey features aside from the simulated individual the record belonged to. These individual records were also linked across years, increasing the challenge to make sure each person's privacy was protected.

To succeed in this competition, participants needed to build de-identification algorithms generating a set of synthetic, privatized survey records that most accurately preserved the patterns in the original data. Because accuracy is evaluated with respect to all map segments (Public Use Microdata Areas) and time segments (years), it was important that solutions provide good performance even on the more complex or sparse sections of the data.

The Results

Winning competitors found that general techniques which had proven effective in the 2018-2019 Synthetic Data challenge (DeID1) were also useful here, although advances were necessary to address the difficulties imposed by the temporal map context.

Probabilistic Graphical Models (PGM): Both N-CRiPT and Minutemen used PGM-based approaches, which were also used by the first place winner of the 2018-2019 challenge. Each team tailored their approach and applied new techniques to deal with the increased sensitivity of the temporal data.
Noisy Marginals: On the high epsilon setting, both DukeDP and DPSyn used approaches which first collected a set of marginal counts from the data, then privatized the counts with added noise, and finally used post-processing and sampling from the public data to produce a final consistent synthetic data set. DPSyn also used a similar technique when they earned second place in the 2018-2019 challenge. Each team introduced new strategies to handle the increased difficulty of the problem, including new approaches to creating the final consistent data.
Histograms: In the 2018-2019 challenge, the DPFields solution partitioned the variables in the schema and then collected (and privatized) histograms over each of the partitions, and finally used these noisy histograms as probability distributions for sampling the synthetic data. Jim King used a similar approach, carefully tailoring the choice of histograms and level of aggregation to the more challenging temporal map problem.

These solutions were evaluated using the "K-marginal Evaluation Metric", designed to measure how faithfully each privatization algorithm preserves the most significant patterns across all features in the data within each map/time segment. One result of the increased query sensitivity in the temporal map challenge is that it is important for solutions to interact with the data strategically, in order to get the most value from each query. Techniques that were able to make efficient use of their queries performed well, especially at higher epsilons.

RESULTS ANNOUNCEMENT + MEET THE WINNERS

CHALLENGE REPOSITORY

NEXT IN THE SERIES: SPRINT 3

Preregistration	August 24, 2020
Open to submissions	October 1, 2020 - January 5, 2021
NIST PSCR Compliance check (for public voting)	January 5-6, 2021
Public voting	January 7-21, 2021
Judging and Evaluation	January 5 - February 2, 2021
Winners Announced	February 4, 2021

Preregistration	August 24, 2020
Sprint #1 - Participation	October 1 - November 15, 2020
Sprint #1 - Evaluation	November 15 - December 11, 2020
Sprint #1 - Winners announced	January 5, 2021
Sprint #2 - Participation	January 6 - February 22, 2021
Sprint #2 - Evaluation	February 22 - March 22, 2021
Sprint #2 - Winners announced	March 23, 2021
Sprint #3 - Participation	March 29 - May 17, 2021
Sprint #3 - Evaluation	May 17 - June 15, 2021
Sprint #3 - Winners announced	June 16, 2021

Open Source Deposit - Submissions due	June 30, 2021
Development Plan - Submissions due	June 30, 2021
Development Plan - Evaluation	July 1-10, 2021
Development Plan - Winners announced	July 14, 2021
Development Execution - Submissions due	October 9, 2021
Development Execution - Evaluation	October 9 - 23, 2021
Development Execution - Winners announced	October 27, 2021

Differential Privacy Temporal Map Challenge: Sprint 2 (Open Arena)

Quick Facts

Participants

No. of Entries

Prize

Winner

rmckenna

Navigation

Why

The Solution

The Results

On this page