SNOMED CT Entity Linking Benchmark

A benchmark for linking text in medical notes to entities in SNOMED Clinical Taxonomy. #health

Benchmark
Open
66 joined

Model: FAISS + Qwen3 Instruct
Reference Score

Abstract

FAISS + Qwen3 Example Solution

This example solution demonstrates a two-stage approach to SNOMED CT entity linking:

  1. Mention detection using exact token matching against a FAISS index built from TF-IDF character n-gram vectors over SNOMED CT terminology.
  2. LLM disambiguation using Qwen3-4B to select the best candidate concept given clinical context and SNOMED CT metadata.

How it works

Terminology index (FaissTerminologyIndex)

The index is built from flattened_terminology.jsonl. For each concept, the preferred name and all synonyms are normalized and vectorized using TF-IDF character n-grams (3-5 characters). The sparse vectors are reduced to 256 dimensions with TruncatedSVD and indexed in a FAISS inner-product index for fast cosine similarity search.

Mention detection (iter_mentions)

Clinical notes are tokenized and all contiguous subsequences of up to 6 tokens are checked against the terminology index. Only spans that exactly match a normalized term are kept. Overlapping mentions are resolved by preferring longer, higher-confidence spans.

LLM disambiguation (LLMLinker)

When multiple candidate concepts match a mention, the LLM is prompted with the mention text, surrounding clinical context (120 chars each side), and a formatted list of candidates including SNOMED CT metadata (hierarchy type, synonyms, parent concepts, child concepts, and defining relationships). The model generates a concept_id, which is extracted from the output.

Ideas for improvement

  • Better mention detection. Add fuzzy matching, edit distance, or a trained NER model (e.g., spaCy with a clinical model, BioBERT) to catch misspellings and abbreviations.
  • Better embeddings. Replace TF-IDF with sentence-transformers or a domain-specific embedding model for the FAISS index.
  • Larger models. Swap in a larger model or fine-tune on SNOMED CT entity linking examples.
  • Fine-tuning. This example doesn't even use any of the training data! A straightforward next step would be to fine-tune on the training set for better disambiguation performance.
  • Richer terminology. Enrich the JSONL with grandparent concepts, reference set memberships, or other SNOMED CT metadata by extending flatten_terminology.py.
  • Prompt tuning. Add few-shot examples, adjust the context window, or restructure the prompt for better disambiguation.

Scores

Reference Scores
Name Macro char IoUSupport-weighted char IoU
FAISS + Qwen3 Instruct
0.2321 0.2160