Unsupervised Wisdom: Explore Medical Narratives on Older Adult Falls

Use unsupervised machine learning approaches to extract insights from emergency department narratives about how, when, and why older adults (age 65+) fall. #health

$70,000 in prizes
oct 2023
660 joined

Finding more efficient and accurate ways to extract insights from unstructured medical data helps public health researchers better understand the circumstances behind older adult falls. This helps drive communication, policy, and further research initiatives to reduce morbidity and mortality.

— Royal Law, Data Science Team Lead, CDC's National Center for Injury Prevention and Control


Falls are the leading cause of injury-related death among adults 65 and older. Preventing unintentional older adult falls is a priority for the Centers for Disease Control and Prevention's National Center for Injury Prevention and Control (CDC/NCIPC), and they do this in part by researching the factors associated with the occurrence and severity of older adult falls.

An important source of information about older adult falls and other injuries is the National Electronic Injury Surveillance System (NEISS), which produces public, standardized data about injuries that present to the emergency departments of a sample of hospitals in the United States. NEISS data include a narrative field that describes the patients, their injuries, and their treatment. However, extracting insights about older adult falls and other injuries from such medical narratives typically involves manual coding procedures that are resource-intensive and difficult to scale.

The Solution

Machine learning techniques for natural language processing (NLP) have the potential to greatly expand the CDC/NCIPC’s capacity for analyzing medical record narratives and extracting insights that can inform fall prevention strategies. The goal in this challenge was to identify effective methods of using unsupervised machine learning and NLP to extract insights about older adult falls from NEISS emergency department narratives.

Solvers submitted solutions in the format of an analysis notebook (in R or Python) and a 1-3 page executive summary that highlighted their key findings, summarized their approach, and included select visualizations from their analyses. Solutions were evaluated by expert judges who considered how novel, well-communicated, rigorous, and insightful they were.

The Results

Across 45 submissions, participants explored a wide range of unsupervised NLP methodologies, including large language models (LLMs), clustering, dimensionality reduction, topic modeling, semantic search, named entity recognition, and more. In addition to the overall prizes, bonus prizes were awarded for the most novel approach, most compelling insight, best visualization, best null result, and most helpful community code post.

See the results announcement for more information on the winning approaches and the teams who developed them. All of the prize-winning notebooks and executive summaries from this competition are linked below and available for anyone to use and learn from.