SNOMED CT Entity Linking Benchmark

A benchmark for linking text in medical notes to entities in SNOMED Clinical Taxonomy. #health

Benchmark
Open
1 joined

Introducing DrivenData Benchmarks!

DrivenData Benchmarks provide a rigorous baseline for comparing different approaches to the same problem. Submit multiple different models, see how your methods perform, and contribute to a public body of knowledge about what works.

Overview

Much of the world's healthcare data is stored in free-text documents, such as clinical notes taken by doctors. Because this information is unstructured, it can be difficult to analyze and extract meaningful insights. However, by applying a standardized terminology like SNOMED CT, healthcare organizations can convert this free-text data into a structured format that computers can readily analyze, helping stimulating the development of new medicines, treatment pathways, and better patient outcomes.

One way to analyze clinical notes is to identify and label the portions of each note that correspond to specific medical concepts. This process is called entity linking because it involves identifying candidate spans in the unstructured text (the entities) and linking them to a particular concept in a knowledge base of medical terminology.

Clinical entity linking is inherently challenging. Medical notes are often rife with abbreviations (some of them context-dependent) and assumed knowledge. Furthermore, the target knowledge bases can easily include hundreds of thousands of concepts, many of which occur infrequently leading to a “long tail” effect in the distribution of concepts.

Benchmark task

The objective of this benchmark is to compare how well different approaches can structure the unstructured data in clinical notes for meaningful use and analysis, using the SNOMED CT clinical terminology. Participants will train models based on real-world doctor's notes which have been de-identified and annotated with SNOMED CT concepts by medically trained professionals. This is the largest publicly available dataset of labelled clinical notes! By submitting to this benchmark, you can see how well your model performs and compare it against other approaches.

How to submit to the benchmark

  1. Click the "Register!" button in the sidebar to register for the benchmark.
  2. Get familiar with the problem through the problem description.
  3. Get access to the dataset of clinical notes and the training set of annotations by following the data access instructions.
  4. Create and train your own model. Check out the benchmark blog post or the winning solutions of the original competition for a good place to start!
  5. Package your model files with the code to make predictions based on the runtime repository specification on the code submission format page.
  6. Create a model description by clicking on "My Models" in the sidebar followed by "Create new model". Give your model a name and fill out details about your approach in the abstract field. (Note: this only creates a description of your approach. You do not need to share your actual model.)
  7. Click on your model's "Make new code job submission" to submit your code as a zip archive for containerized execution.
  8. Once you have a successful submission, click "Make public" to share your model description with the community and be added to the benchmark leaderboard!

Rules

The benchmark rules are in place to promote fair benchmarking and useful solutions. If you are ever unsure whether your solution meets the benchmark rules, ask the challenge organizers in the forum or send an email to info@drivendata.org.


This benchmark is sponsored by SNOMED International

SNOMED International logo

In partnership with Veratai and PhysioNet

Veratai logo            PhysioNet logo

Image courtesy of SNOMED International