Youth Mental Health Narratives: Novel Variables

Discover novel trends from narratives about youth suicides using machine learning techniques. #health

$25,000 in prizes
Completed nov 2024
372 joined

Problem description

The objective of the Youth Mental Health Narratives: Novel Variables challenge is to apply machine learning techniques to extract novel variables from the narrative text in the National Violent Death Reporting System (NVDRS) that could be used to advance youth mental health research.

This is an unsupervised challenge! That means there are no labels for this competition. Participants will explore the data, come up with their own conclusions, and write up their results according to the submission format.

Not sure where to start? Follow the steps in the "How to compete" section of the homepage.

Data


The National Violent Death Reporting System (NVDRS) collects information about violent deaths including homicides, suicides, and deaths caused by law enforcement acting in the line of duty. The sample of NVDRS data provided in this competition includes de-identified information on completed suicides from 2017-2021 among youth aged 5-24.

Information in the NVDRS is primarily derived from law enforcement reports and coroner/medical examiner reports. Data abstractors generate summary narratives from these sources, and extract information into several standardized fields that are useful for researchers. See the About page for more details of the data creation process.

The NVDRS dataset has three main components:

  • Narratives (NarrativeLE and NarrativeCME): Full text summaries of the information in the law enforcement and coroner/medical examiner reports for each case.

  • Standard variables: Standard variables provide specific information that is useful to researchers, such as which common circumstances that contribute to suicide are present. Standard variables are extracted from the narratives. All standard variables are either binary or categorical.

  • Free text fields: In addition to the narratives, there are a few more free text fields for information that cannot be captured in the standard variables. These fields are CME/LE_CircumstancesOtherText, CME/LE_CrisisOtherDescription, CME/LE_DisclosedToOtherDescription, and CME/LE_HealthProblemTypeDescription.

Your goal in this challenge is to identify new standard variables that would be helpful for researchers. The new variable(s) you identify should capture information that is in the narrative or free text fields, but not yet captured in any of the existing binary or categorical standard variables.

For a detailed description of each field in the data, see the Feature list section.

The data for this competition includes detailed descriptions of youth suicides. These narratives can be upsetting and difficult to read. We encourage you to prioritize your own mental health when deciding whether to work on this challenge and while engaging with the data.

If you or someone you know is struggling with mental health, you can call or text the 988 Suicide & Crisis Lifeline for 24/7, free, and confidential support. The 988 Lifeline website has additional advice and links to specialized resources.

External data and models

Use of external data and models is allowed in this competition provided they are freely and publicly available to all participants under a permissive open source license.

However, participants may not upload competition data to any third party services or APIs that retain the data. For example, participants cannot submit data to ChatGPT or Gemini. Participants can use external models by loading open-source model weights into an environment that they can wipe the data from afterwards, such as their local machine or a cloud compute environment.

Guiding Questions


This is an open-ended challenge. To help you get started, we are providing some guiding lines of inquiry. Feel free to use these or try something completely different!

Examples of useful new variables to identify:

  • Common circumstances or contributing factors that are not yet tracked by the standard variables
  • Additional variables related to a theory about the causes or processes that lead to suicide. I.e., What components of an academic theory are not covered by the NVDRS, and how could they be captured? A few references are provided below to understand key theories of suicide.
  • Variables needed to pursue major directions of future or emerging research suggested by existing literature
  • Variables needed to either confirm or contradict the findings of major breakthrough research in the field of mental health

Examples of not useful approaches:

  • Demonstrating analysis that can be done with the existing standard variables. E.g., Examining trends by weapon type that are already captured in standard variables

General tips:

  • Ground your submission in existing literature. You don't need to become an expert in youth mental health research, but you should have some literature-based motivation for your work. The About page has some good articles to start with.
  • Applying machine learning and natural language processing to mental health, and in particular to the NVDRS dataset, is very new. Part of the goal of this challenge is to more broadly uncover the potential of machine learning for mental health research. Advancing our understanding of which methods work well for this dataset and field, and why, is valuable.
  • Reference the NVDRS Coding Manual for additional background about the dataset and details of each field.

Submission format


You can use either Python or R to conduct your analysis. Submissions must be in the form of a detailed report that describes your work and explains how your submission meets the evaluation criteria. Use the prompts provided below to structure your report to address all of the criteria.

You are allowed to make only one submission. To make changes, you can delete and re-upload your submission as many times as you like. Only the last entry submitted will be considered.

Submissions must be a PDF file named submission.pdf that meets the following requirements:

  • 4 pages maximum including figures and tables. Information beyond the page limit will not be considered during evalution.
  • On paper size 8.5 x 11 inch or A4, with minimum margins of 1 inch
  • Minimum font size of 11
  • Minimum single-line spacing

Submission template

Key findings

  1. What additional variables do you recommend abstracting from the narratives? Drawing from your analysis, briefly explain why each new variable is useful.

Methodology

  1. How did you decide what question, topic, or area related to youth mental health to explore? Please mention any references you considered, including research studies, review articles, published theories, etc.
  2. Please provide a high-level summary of your approach. What did you do (e.g., data preprocessing, key features, algorithms, other novel aspects of your solution)? What tools did you use (e.g., Python, specific open-source packages)? You are encouraged to include a diagram of your data processing and analysis pipeline.
  3. Why did you decide to use these methods? How does your approach advance technical capabilities for studying youth mental health?
  4. How did you evaluate the performance of your approach?
  5. Are there any additional approaches you tried that did not make it into your final workflow (e.g., features, preprocessing steps, model types, etc.)?
  6. Copy and paste snippets of the 3 most impactful parts of your code, and briefly explain what each does.
  7. Are there any other takeaways from your analysis that you would like to share? These do not have to be related to suggesting new variables (e.g., interesting trends or noise in the data, strategies for running your code efficiently, etc.).

Visualizations

  1. Do you have any useful tables, charts, graphs, or visualizations from the process (e.g., exploring the data, testing different features, summarizing model performance, etc.)?

Midpoint feedback and prizes

You will have an opportunity partway through the competition for a midpoint submission. Up to five of the most promising midpoint submissions will receive a prize of $1,000 each. Midpoint submissions will be evaluated using the same criteria as final submissions.

General feedback will be provided about how submissions can better address the judging criteria. If you have specific individual questions, you are encouraged to post them in the competition forum!

Midpoint submissions are due by October 10, 2024. Midpoint submissions must be a PDF file named submission.pdf that meets the following requirements:

  • 2 pages maximum including figures and tables
  • On paper size 8.5 x 11 inch or A4, with minimum margins of 1 inch
  • Minimum font size of 11
  • Minimum single-line spacing
Midpoint submission template
  1. What question, topic, or area related to youth mental health are you planning to explore? How did you decide on this topic? Please mention any references you considered, including research studies, review articles, published theories, etc.
  2. What methods have you experimented with so far (e.g., data preprocessing, key features, algorithms, other novel aspects of your solution)? Why did you decide to try these methods?
  3. What other methods are you planning to test and why?
  4. What variables are you planning to explore extracting from the narratives and why? Do you have any findings so far?
  5. Do you have any useful tables, charts, graphs, or visualizations from the process so far (e.g., exploring the data, testing different features, summarizing model performance, etc.)?

Evaluation


Winners will be selected by a panel of experts on the NVDRS dataset and related research. Submissions will be judged based on the rubric below.

Technical novelty (30%)

  • Does the submission demonstrate creative, cutting-edge, or innovative techniques for applying machine learning to this dataset?
  • Does the submission have potential to expand technical capabilities in the field of youth mental health research?

Insight (25%)

  • Does the submission uncover useful new variables beyond those already abstracted from the narratives (e.g., common circumstances, prevention strategies, drivers, etc.)?
  • Is the submission informed by existing research in the field of youth mental health and suicide, and does it attempt to fill gaps in the field?

Rigor (25%)

  • Is the submission based on appropriate and correctly implemented methods and approaches (e.g. preprocessing, models, evaluation)?

Communication (20%)

  • Are the findings and methodology clearly and effectively communicated?

Please follow the provided submission template to fully address each metric in your submission.

Feature list


Although the data is de-identified, this sample contains confidential details. As such, participants are not allowed to share the data or use it for any purpose other than this competition. Participants cannot send the data to any 3rd-party API. For example, participants cannot submit data to OpenAI's ChatGPT, Google's Gemini, etc. For more details, see the External Data and Models section and the full competition rules.

Participants can access features.csv on the data download tab. Your analysis must use NarrativeLE or NarrativeCME, but all other fields are optional. NarrativeLE and NarrativeCME together cover almost all of the information that is extracted to the remaining variables.

An extensive set of standard variables are provided to illustrate what information relevant to suicide is already captured in the NVDRS. However, there are still many additional fields in NVDRS that are not included in the competition data. To make sure your new variable is not already captured, you can reference the full list of NVDRS standard variables in the NVDRS Coding Manual. You can also search the coding manual for any of the variables listed below to get more details about the field.

Individual identifier

  • uid (str): Unique ID for each individual

Narratives

  • NarrativeLE (str): A summary of the information in the law enforcement (LE) report
  • NarrativeCME (str): A summary of the information in the coroner/medical examiner (CME) report

Narratives are written by state-level abstractors to summarize key information found in law enforcement reports (LE) and coroner/medical examiner reports (CME). Abstractors follow specific instructions from the NVDRS coding manual to generate the narratives (page 24). The coding manual specifies what should be included, what shouldn't be included, and how to handle multiple sources.

Other free text fields

For each of the free text fields below, there are separate columns for information from the law enforcement (LE) and coroner/medical examiner (CME) reports. E.g., both CME_CircumstancesOtherText and LE_CircumstancesOtherText are columns in the data.

  • CME/LE_CircumstancesOtherText (str): Circumstances that may have contributed to the death, but that are not captured by existing variables.
  • CME/LE_CrisisOtherDescription (str): Description of a crisis that occured within two weeks of the incident, but is not captured in other variables for circumstance.
  • CME/LE_DisclosedToOtherDescription (str): If suicide intent was disclosed to someone other the available categories (see 5.7.5 in the coding manual), description of who intent was disclosed to.
  • CME/LE_HealthProblemTypeDescription (str): A physical health problem that contributed to the death, but is not captured by the existing variables (see 5.7.13 in the coding manual).

Mental health history and current state

  • MentalHealthProblem (bool): The person had a mental health condition at the time
  • DepressedMood (bool): The person was perceived to be depressed at the time
  • MentalIllnessTreatmentCurrnt (bool): Currently in treatment for a mental health or substance abuse problem
  • HistoryMentalIllnessTreatmnt (bool): History of ever being treated for a mental health or substance abuse problem
  • SuicideAttemptHistory (bool): History of attempting suicide previously
  • SuicideThoughtHistory (bool): History of suicidal thoughts or plans
  • AlcoholProblem (bool): The person struggled with alcohol dependence
  • SubstanceAbuseOther (bool): The person struggled with substance abuse not related to alcohol
  • OtherAddiction (bool): An addiction other than alcohol or other substance abuse, such as gambling, appears to have contributed
  • HistorySelfHarm (bool): History of self harm
  • TreatmentNonAdherence (bool): The person did not actively follow a treatment plan prescribed for their mental health or substance abuse treatment

Mental health diagnoses

  • CME/LE_MentalHealthDiagnosis1/2 (str, categorical): Types of mental illness diagnosis that applied. These correspond to specific options outlined in the coding manual (5.3.3). There are separate columns for information from the law enforcement (LE) and coroner/medical examiner (CME) reports.

  • CME/LE_MentalHealthDiagnosisOther (str): Additional mental illness diagnoses. This can be used to specify a third diagnosis if both MentalHealthDiagnosis1 and MentalHealthDiagnosis2 are filled, or to specify a diagnosis that is not one of the options in the coding manual (5.3.3). There are separate columns for information from the law enforcement (LE) and coroner/medical examiner (CME) reports.

Other contributing factors

The following binary variables indicate whether a variety of factors appear to have contributed to the death.

  • Argument (bool): An argument or conflict
  • IntimatePartnerProblem (bool): Problems with a current or former intimate partner
  • FamilyRelationship (bool): Relationship problems with a family member (other than an intimate partner)
  • FamilyStressor (bool): Other family stressors that are not relationship oriented
  • RelationshipProblemOther (bool): Problems with a friend or associate other than intimate partner or family member
  • SchoolProblem (bool): Problems at or related to school
  • EvictionOrLossOfHome (bool): A recent eviction or other loss of housing, or the threat of an eviction or loss of housing
  • LivingTransition (bool): Transitioned to an assisted living situation in the last 12 months
  • JobProblem (bool): Job problems
  • FinancialProblem (bool): Financial problems
  • RecentSuicideFriendFamily (bool): Recent suicide of a friend or family member
  • DeathFriendOrFamilyOther (bool): Death of a family member or friend due to something other than suicide
  • PrecipitatedByOtherCrime (bool): Another serious crime occured before the death
  • RecentCriminalLegalProblem (bool): Criminal legal problems
  • LegalProblemOther (bool): Civil legal (non-criminal) problem
  • DisasterExposure (bool): Exposure to a natural or man-made disaster. E.g., earthquake, bombing, disease outbreak
  • CaregiverBurden (bool): Stress from acting as a caregiver for an ill, disabled, or elderly person
  • TraumaticAnniversary (bool): Incident occurred on or near the anniversary of a traumatic event in the person's life
  • CrisisOther (bool): A crisis occurred within the past two weeks that cannot be captured by other standard variables

Disclosure of intent

  • SuicideNote (bool): The person left a suicide note
  • SuicideIntentDisclosed (bool): The person disclosed their thoughts and/or plans to die by suicide to someone else within the last month
  • DisclosedTo{__} (bool): Binary variables indicating who intent was disclosed to. Variables are included for IntimatePartner, OtherFamilyMember, Friend, SocialMedia, HealthCareWorker, Neighbor, Unknown and Other. E.g., DisclosedToIntimatePartner
    • In the coding manual, this variable is listed at DisclosedIntentToWhom (5.7.5)

Incident details

  • InjuryLocationType (str, categorical): The type of place where the suicide took place. This will be one of the options specified in the coding manual (4.3.3), which are "House, apartment"; "Motor vehicle (excluding school bus and public transportation)"; "Natural area (e.g., field, river, beaches, woods)"; "Street/road, sidewalk, alley"; "Park, playground, public use area"; and "Other"

  • WeaponType1 (str, categorical): Type of weapon used. This will be one of the options specified in the coding manual (6.1), which are "Firearm"; "Hanging, strangulation, suffocation"; "Poisoning"; "Fall"; "Other transport vehicle, eg, trains, planes, boats"; "Motor vehicle including buses, motorcycles"; "Drowning"; "Sharp instrument"; "Fire or burns"; "Blunt instrument"; "Unknown"; and "Other (e.g. taser, electrocution, nail gun)"

Physical health problems

  • PhysicalHealthProblem (bool): Physical health problem(s) appear to have contributed to the death
  • HealthProblem{__} (bool): Binary variables indicating whether different types of physical health problems contributed. Variables are included for Acute, TerminalIllness, ChronicPain, PainUnkDuration, Unknown, and Other. E.g., HealthProblemAcute
    • In the coding manual, this variable is listed as TypePhysicalHealthProblem (5.7.13)

Other relevant personal or home life details

  • HouseholdSubstanceAbuse (bool): Evidence of substance use in a child victim's household
  • AbusedAsChild (bool): History of abuse or neglect as a child
  • PriorCPSReport (bool): A prior Child Protective Services (CPS) report was filed on the child's household
  • VictimKnownToAuthorities (bool): History of contact with authorities. In the coding manual, this variable is listed as VictimContact (5.4.11)
  • InterpersonalViolencePerp (bool): The individual was a perpetrator of violence within the past month separate from the fatal incident. In the coding manual, this is listed as InterpersonalViolencePerpet (5.4.14)
  • InterpersonalViolenceVictim (bool): The individual experienced violence within the past month separate from the fatal incident
  • TraumaticBrainInjuryHistory (bool): History of traumatic brain injury

Good luck!


If you have any questions you can always visit the competition forum!