Youth Mental Health Narratives: Novel Variables

Discover novel trends from narratives about youth suicides using machine learning techniques. #health

$25,000 in prizes
Completed nov 2024
372 joined

Why

Suicide is one of the leading causes of death in the United States for 5-24 year-olds. Researchers and policymakers study the circumstances of youth suicides to better understand them and reduce their occurrence. One key source of information is the National Violent Death Reporting System (NVDRS). The NVDRS captures information about violent deaths across the United States that has been abstracted from sources including law enforcement reports, coroner/medical examiner reports, toxicology reports, and death certificates.

The NVDRS contains narrative summaries drawing from those sources as well as standard variables that are useful to researchers. The process of recording standardized variables is time consuming and prone to human error.

The Solution

The goal of this challenge was to improve both the quality and coverage of standard variables in the NVDRS. Higher-quality data can enable researchers across the country to better understand and prevent youth suicides on a national scale.

In the Automated Abstraction track, participants created machine learning models to generate standard variables from NVDRS narratives. Participants trained models on 4,000 narratives from the NVDRS, and submitted executable code to generate predictions on 1,000 test set narratives.

In the Novel Variables track, participants explored the narratives to suggest new standard variables that could advance youth mental health research. Submissions consisted of a qualitative writeup describing the suggested new variables, any motivation based on existing research, and the methodologies used to study the data.

The Results

Over 750 participants from 81 countries engaged with NVDRS data as part of this challenge. Across both tracks, large language models (LLMs) were by far the most popular and effective tool. In the Automated Abstraction track, DrivenData kicked off the competition by posting a simple LLM prompting benchmark that received an F1 score of 56%. More than 50 participants beat the benchmark, with top solutions achieving an F1 score of over 86%. In the Novel Variables track, participants studied topics like social media use, video games, gender, sexuality, and sleep.

See the results announcements for more information on the winning submissions and the teams who developed them. All of the prize-winning submissions and write-ups from this competition are linked below and available for anyone to use and learn from.

The data provided for the competition is highly confidential and sensitive. To respect the privacy of individuals in the data, participants should delete any raw competition data that they have stored.


RESULTS ANNOUNCEMENT + MEET THE WINNERS

WINNING SUBMISSIONS ON GITHUB