Youth Mental Health Narratives: Automated Abstraction

Apply machine learning techniques to automate the process of data abstraction from youth suicide narratives. #health

$45,000 in prizes
Completed nov 2024
588 joined

Overview

Suicide is one of the leading causes of death in the United States for 5-24 year-olds. In order to better understand the circumstances around youth suicides and inform potential interventions, researchers and policymakers rely on several narrative datasets. One of these datasets, the National Violent Death Reporting System (NVDRS) includes data abstracted on the state level from multiple sources including law enforcement, coroner/medical examiners, toxicology reports, and death certificates.

However, the process of data abstraction is very time-consuming and prone to human error. Moreover, data quality and assurance are also predominantly manual and labor-intensive.

The objective of the Automated Abstraction track of this challenge is to apply machine learning techniques to automate the population of standard variables from the narrative text in NVDRS in order to help streamline manual abstraction and data quality control. The competition results will inform how youth mental health data is tracked, contributing to more effective research into protecting youth mental health. This competition presents a unique opportunity to work with a dataset that has not been de-identified and made public before. In the Novel Variables track, participants will propose new variables to add to the NVDRS dataset.


Competition End Date:

Nov. 21, 2024, 11:59 p.m. UTC

Place Prize Amount
1st $18,000
2nd $13,000
3rd $9,000
4th $5,000

In the event of a tie, winners will be decided by inference time (shortest is best).

Note on prize eligibility: Federal employees acting within the scope of their employment and federally-funded researchers acting within the scope of their funding are not eligible to win a prize in this challenge.


How to compete

  1. Click the "Compete!" button in the sidebar to enroll in the competition. To sign up, all competitors will be required to sign a Data Sharing Agreement which governs the protection and use of sensitive data in the National Violent Death Reporting System.
  2. Get familiar with the problem through the problem description. You might also want to reference additional resources available on the about page.
  3. Download the data from the data tab.
  4. Create and train your own model. The benchmark blog post is a good place to start.
  5. Bundle your trained model and prediction code for evaluation in our cloud runtime. See the code submission format page for more detail.
  6. Test your submission locally, and in the smoke test environment.
  7. Click “Code jobs” in the sidebar, and then “Make new code submission”. You’re in!

Competition rules

The challenge rules are in place to promote fair competition and useful solutions. If you are ever unsure whether your solution meets the competition rules, ask the challenge organizers in the competition forum or send an email to info@drivendata.org. A few key rules are highlighted below. For more details, see the External Data and Models guidance and the full competition rules.

Use of competition data

When agreeing to the competition terms, all competitors will be required to sign a Data Sharing Agreement which governs the protection and use of sensitive data in the National Violent Death Reporting System. All competitors must abide by the Terms and Conditions of the Data Sharing Agreement to be eligible for access and submission.

Although competition data is de-identified, it is still highly sensitive and confidential. Participants cannot use competition data for any purpose other than the competition, and must delete competition data after the competition has ended.

External model usage

Use of external models is allowed in this competition provided they are freely and publicly available to all participants under a permissive open source license.

However, competition data cannot be shared, duplicated, or published. This includes uploading competition data to any third-party service that will retain the data. For example, participants cannot upload competition data to ChatGPT or Gemini, but can download open-source model weights and run a model locally.


Support

This challenge is organized on behalf of CDC with support from NASA. The contents do not necessarily represent the official views of the CDC.

Statutory authority to conduct the challenge: 15 U.S.C. 3719, Section 1703(a) of the Public Health Service Act (PHSA), 42 U.S.C. 300u-2(a). DrivenData is designing and administering the challenge under contract with the NASA Tournament Lab, under Federal Acquisition Regulation (FAR) procurement regulations authority and in collaboration with CDC’s National Center for Injury Prevention and Control.


Image credits: National Cancer Institute on Unsplash, jannoon028 on Freepik