SNOMED CT Entity Linking Challenge

Link spans of text in clinical notes to concepts in the SNOMED CT clinical terminology. #health

$25,000 in prizes
mar 2024
553 joined

Overview

Much of the world's healthcare data is stored in free-text documents, usually clinical notes taken by doctors. This unstructured data can be challenging to analyze and extract meaningful insights from. However, by applying a standardized terminology like SNOMED CT, healthcare organizations can convert this free-text data into a structured format that can be readily analyzed by computers, in turn stimulating the development of new medicines, treatment pathways, and better patient outcomes.

One way to analyze clinical notes is to identify and label the portions of each note that correspond to specific medical concepts. This process is called entity linking because it involves identifying candidate spans in the unstructured text (the entities) and linking them to a particular concept in a knowledge base of medical terminology.

However, clinical entity linking is hard!  Medical notes are often rife with abbreviations (some of them context-dependent) and assumed knowledge. Furthermore, the target knowledge bases can easily include hundreds of thousands of concepts, many of which occur infrequently leading to a “long tail” effect in the distribution of concepts.

Task

The objective of this competition is to link spans of text in clinical notes with specific topics in the SNOMED CT clinical terminology. Participants will train models based on real-world doctor's notes which have been de-identified and annotated with SNOMED CT concepts by medically trained professionals. This is the largest publicly available dataset of labelled clinical notes, and you can be one of the first to use it!


Prizes

Competition End Date:

March 5, 2024, 11:59 p.m. UTC

Place Prize Amount
1st $12,500
2nd $7,500
3rd $5,000

How to compete

  1. Click the "Compete!" button in the sidebar to enroll in the competition.
  2. Get familiar with the problem through the problem description.
  3. Get access to the dataset of clinical notes and the training set of annotations by following the data access instructions.
  4. Create and train your own model. Check out the benchmark blog post for a good place to start!
  5. Package your model files with the code to make predictions based on the runtime repository specification on the code submission format page.
  6. Click “Submissions” on the sidebar followed by “Make new submission” to submit your code as a zip archive for containerized execution. You’re in!

The challenge rules are in place to promote fair competition and useful solutions. If you are ever unsure whether your solution meets the competition rules, ask the challenge organizers in the competition forum or send an email to info@drivendata.org.


This challenge is sponsored by SNOMED International

SNOMED International logo

In partnership with Veratai and PhysioNet

Veratai logo            PhysioNet logo

Image courtesy of SNOMED International