NASA Airathon: Predict Air Quality (Particulate Track) Hosted By NASA


2005 poor air quality in Los Angeles

Exposure to air pollution is the top environmental risk factor for premature death, but millions of people across the globe don't have access to reliable data about their current local air quality. Algorithms that combine data from satellites with ground monitors are critical to filling this information gap.

— Abbey Nastan, MAIA Deputy Program Applications Lead, NASA's Jet Propulsion Laboratory in Southern California


Air pollution is one of the greatest environmental threats to human health. Currently, no single satellite instrument provides ready-to-use, high resolution information on surface-level air pollutants, while existing high-quality ground monitors are expensive and have large gaps in coverage. This gap in information means that millions of people cannot take daily action to protect their health.

The Solution

Models that make use of widely available satellite data have the potential to provide local, daily air quality information. The goal of this challenge was to use remote sensing data and other geospatial data sources to estimate daily levels of air pollution with high spatial resolution (5km by 5km). This competition focused on two critical air quality measures: particulate matter less than 2.5 micrometers in size (PM2.5) and nitrogen dioxide (NO2).

To train and evaluate solutions, data was provided for three urban geographies: Los Angeles, Delhi, and Taipei. These locations have readily available satellite data but varying levels of pollution and historical data.

The Results

This challenge tested more than 1,200 submissions from over 1,000 participants over the course of the competition! The top models made significant gains over benchmark measures, achieving an R-squared value of 0.81 for PM2.5 and 0.48 for NO2 (compared with 0.44 and 0.03 R-squared measures of inter-location variability, shown below).

These solutions were able to use imputation and stratified training techniques, combined with the data sources they found to be most useful, to deal with the sparsity of satellite data and vastly different distributions of pollutants in each location.

R-squared scores for a benchmark and the competition winners R-squared scores for a benchmark and the competition winners

See the results announcement for more information on the winning approaches. These solutions are made available under an open source license for anyone to use and learn from.