Youth Mental Health Narratives: Automated Abstraction

Apply machine learning techniques to automate the process of data abstraction from youth suicide narratives. #health

$45,000 in prizes
Completed nov 2024
588 joined

About


This challenge is administered by DrivenData on behalf of CDC’s National Center for Injury Prevention and Control (NCIPC), which helps Americans stay safe and healthy by uncovering insights that could reduce the severity and occurrence of violence and injury. One key area of study enabled by the NCIPC is suicide prevention. One important source of data for researchers is the National Violent Death Reporting System (NVDRS). NVDRS captures information about violent deaths across the United States, including both homicides and suicides.

Machine learning techniques for natural language processing (NLP) have the potential to greatly expand NCIPC’s capacity for analyzing descriptions of suicides in NVDRS by:

  1. Streamlining the data management process and facilitating quality review
  2. Uncovering potentially useful information that is not currently tracked

Learning more about the circumstances of youth suicides can help inform prevention strategies. This challenge was designed to uncover creative and cutting-edge applications of machine learning for mental health care.

About the data

Context, history, and details about the NVDRS dataset are captured in the NVDRS Coding Manual.

NVDRS is managed by the CDC, but based on sources that are submitted by individual states. Information is derived from law enforcement reports, coroner/medical examiner reports, toxicology reports, and death certificates. State-level data abstractors generate summary narratives from these sources, and extract information into several standardized fields that are useful to researchers.

Detailed dataset creation process:

  1. States collect original sources: NVDRS is based on confidential reports, including law enforcement and coroner/medical examiner reports, that are collected at the state level and cannot be shared directly with the CDC.
  2. States process original source: There are "abstractors" within state agencies with access to the raw data sources. These abstractors process information from the original sources according to the guidelines in the NVDRS Coding Manual.
  3. CDC manages the database: The narratives and standard variables created by abstractors are sent to the CDC. The CDC performs some quality checks on the incoming data, and then integrates it with the larger NVDRS database.

Researchers studying mental health can then request access to the NVDRS data. This competition uses a de-identified sample of NVDRS data.

Research using NVDRS

Below is a short list of papers demonstrating how the NVDRS dataset enables better understanding and protection of youth mental health. The research below provides some useful starting points for approaches in the Novel Variables track of this competition, and illustrates the importance of high-quality and comprehensive NVDRS data.

Research applying machine learning to this problem:

Machine learning has not yet been widely applied to this problem or this dataset, which presents participants with a unique opportunity to move the field forward.