VisioMel Challenge: Predicting Melanoma Relapse

Use digitized microscopic slides to predict the likelihood of melanoma relapse within the next five years. #health

€25,000 in prizes
may 2023
541 joined

Problem description

The data for this challenge includes thousands of microscopic slides of skin melanomas from medical centers across France. Your task is to estimate the likelihood that a relapse will occur within five years.

Training set


The features in this challenge include Whole Slide Images (WSIs) along with clinical metadata.

Images

You are provided with WSIs of melanomas in the form of pyramidal TIFs, which are a multi-resolution, tiled format. Each resolution is stored as a separate layer or "page" within the TIF. Page 0 contains the image at the full resolution of the WSI. Each subsequent page is the previous page downsampled by a factor of two. For more detail along with tips and tricks for working with WSIs, see the data resources page.

Each patient is represented by one pyramidal TIF, which corresponds to one row in the metadata csv. The filename column is the unique identifier that connects the images to the metadata.

Image example


Here's an example image of 1u4lhlqb.tif: Image of melanoma, from page=6 of 1u4lhlqb.tif

Note: Sometimes slides may contain markings around the melanoma made by pathologists, or black blurs along the edges made by the slide holder, but these are not related to whether there is a relapse. There may also be certain parts of slides that are out of focus or white space; these regions are not important to the task at hand and are also unrelated to relapse. For examples of these irrelevant regions, see the data resources page for more information.

To help you understand how pathologists analyze a slide to predict relapse, we have provided 16 annotated slides on the data download page. The annotations show how clinical variables are measured and outline healthy tissue and lesions (but not necessarily all lesions present on a slide). Visit the data resources page for more information.

All training images are hosted in a public s3 bucket. AWS CLI will be useful for downloading images (you will probably need the --no-sign-request argument to the CLI). train_metadata.csv contains filepaths by bucket region. The data is replicated to buckets hosted in the US, the EU (Germany), and Asia (Singapore). Pick the bucket closest to your machine geographically to maximize transfer speeds. Checksums are also provided, which can be used to validate that downloaded data is not corrupted. See data_download_instructions.txt on the data download page for more detailed instruction.

Metadata

Alongside the WSIs, you are provided with a metadata file that contains clinical variables collected from the patient at the time of initial diagnosis. These include demographic factors (e.g., age and sex) as well as some variables related to the tumor (e.g., its location and thickness).

train_metadata.csv contains the following variables:

  • filename (str) - unique identifier for each WSI
  • age (str) - age range of the patient at initial diagnosis
  • sex (int) - sex of the patient at initial diagnosis, where 1=male and 2=female
  • body_site (str) - the site of the melanoma at initial diagnosis
  • melanoma_history (str) - whether the patient had melanoma before
  • breslow (str) - thickness of the melanoma in mm at initial diagnosis
  • ulceration (str) - whether the melanoma had ulceration, which is a total loss of epidermal tissue
  • resolution (float) - the resolution at level 0 of the slide in microns per pixel
  • tif_cksum (str) - the result of running the unix cksum command on the TIF image
  • tif_size (int) - the file size in bytes of the TIF image
  • us_tif_url (str) - file location of the pyramidal TIF in the public s3 bucket in the US East region
  • eu_tif_url (str) - file location of the pyramidal TIF in the public s3 bucket in the EU region
  • asia_tif_url (str) - file location of the pyramidal TIF in the public s3 bucket in the Asia Pacific region

Train metadata example


For example, image 1u4lhlqb.tif has the following information in train_metadata.csv:

filename 1u4lhlqb.tif
age [32:34[
sex 2
body_site thigh
melanoma_history YES
breslow <0.8
ulceration NO
resolution 0.264384
tif_cksum 3028450373
tif_size 747151312
us_tif_url s3://.../1u4lhlqb.tif
eu_tif_url s3://.../1u4lhlqb.tif
as_tif_url s3://.../1u4lhlqb.tif

Labels

The labels for this dataset come from patients' medical records, and indicate whether the patient was diagnosed with a melanoma relapse in the 5 years after initial diagnosis.

train_labels.csv contains the following variables:

  • filename (str) - unique identifier for each patient
  • relapse (int) - whether there was a relapse, where 0=no relapse and 1=relapse

Label example


For example, the first five rows in train_labels.csv have these values:

filename relapse
1u4lhlqb.tif 0
rqumqnfp.tif 0
bu5xt1xm.tif 0
dibvu7wk.tif 0
qsza4coh.tif 0

Test set


The test set images and metadata are only accessible in the runtime container and are mounted at /code_execution/data.

test_metadata.csv contains the following variables:

  • filename (str) - unique identifier for each WSI
  • age (str) - age range of the patient at initial diagnosis
  • sex (int) - sex of the patient at initial diagnosis, where 1=male and 2=female
  • body_site (str) - the site of the melanoma at initial diagnosis
  • melanoma_history (str) - whether the patient had melanoma before
  • resolution (float) - the resolution at level 0 of the slide in microns per pixel

Note that breslow and ulceration are not included in the test metadata. You may still find these variables useful in model training. See the data resources page for more on how pathologists use breslow depth and ulceration to make a prognosis. Annotated slide examples are also provided on the data download page.

Performance metric


Performance is evaluated according to log loss. Log loss (a.k.a. logistic loss or cross-entropy loss) penalizes confident but incorrect predictions. It also rewards confidence scores that are well-calibrated probabilities, meaning that they accurately reflect the long-run probability of being correct. This is an error metric, so a lower value is better.

Log loss for a single observation is calculated as follows:

$$L_{\log}(y, p) = -(y \log (p) + (1 - y) \log (1 - p))$$

where |$y$| is a binary variable indicating whether relapse occured and |$p$| is the user-predicted probability of relapse. The loss for the entire dataset is the average loss across all observations.

Note: log loss can often be improved with calibration. A well-calibrated model outputs predictions that are directly interpretable as probabilities.

Submission format


This is a code execution challenge! Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution.

Your code submission must contain a main.py script that reads in the test slide image set, generates predictions for each slide image, and outputs a single submission.csv containing the likelihood of relapse for all the slide images.

See details on the submission format and process here.

Good luck!


If you're wondering how to get started, check out the benchmark blog post!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum.