Navigation

Sample of a cervical biopsy

Overview

A biopsy is a sample of tissue examined at a microscopic level to diagnose cancer or signs of pre-cancer. While most diagnoses are still made with photonic microscopes, digital pathology has developed considerably over the past decade as it has become possible to digitize slides into "virtual slides" or "whole slide images" (WSIs). These heavy image files contain all the information required to diagnose lesions as malignant or benign.

Making this diagnosis is no easy task. It requires specialized training and careful examination of microscopic tissue. Approaches in machine learning are already able to help analyze WSIs by measuring or counting areas of the image under a pathologist's supervision. In addition, computer vision has shown some potential to classify tumor subtypes, and in time may offer a powerful tool to aid pathologists in making the most accurate diagnoses.

This challenge focuses on epithelial lesions of the uterine cervix, and features a unique collection of thousands of WSIs collected from medical centers across France. The lesions in slides like these are most often benign (class 0), but some others have low malignant potential (class 1) or high malignant potential (class 2), and others may already be invasive cancers (class 3).

Using this unique dataset, your objective is to detect the most severe epithelial lesions of the uterine cervix present in these biopsy images.

This is a sizable dataset (700GB) of extremely high resolution images (e.g. 150,000 x 85,000 pixels). Given the scale of the dataset, handling the data efficiently is a critical problem to solve. It's not possible to just push these images through a pretrained ImageNet network, so you'll have to get more creative.

That's why we've made this a code execution challenge! That means you will be submitting code that runs inference in the cloud. Your model must run fast enough on this large scale data to be useful in practice. This setup rewards models that perform well on unseen images and brings these innovations one step closer to impact.

Additional challenge notes

Challenge data: Whole slide images are digital formats that allow glass slides to be viewed, managed, shared, and analyzed on a computer monitor. These extremely high resolution images require special software to be able to read and manipulate in memory. You can find more information and tips for working with the data on the Data Resources page.

External data & annotation: As noted in the Challenge Rules, external data and pre-trained models are allowed in this competition as long as they are freely and publicly available. In addition, participants are welcome to add their own private annotations to the challenge data. At the end of the challenge, top-performing participants will need to publicly share any private annotations and approaches in order to be eligible for a prize.

Research note: A focus of this challenge is to feature a new dataset for research and to engage pathologists, data scientists, and developers in working with it. As with any research dataset like this one, initial algorithms may pick up on correlations that are incidental to diagnosis. Solutions in this challenge are intended to serve as a starting point for continued research and development. The challenge organizers intend to make the collection of WSI data available online after the competition for ongoing improvement.

Competition End Date:

Oct. 29, 2020, 11:59 p.m. UTC

Place	Prize Amount
1st	€12,000
2nd	€8,000
3rd	€5,000

Cash prizes will be awarded to the teams that will share their solutions under an open source license per the Competition Rules. However, willingness to share solutions will not impact the leaderboard positions, so that top performers will still maintain their leaderboard rankings regardless of whether they receive a cash prize.

Note: Prizes delivered by DrivenData in USD, based on the exchange rate on September 10, 2020.

Organized by the French Society of Pathology

This challenge is organized by the French Society of Pathology (SFP), in partnership with the Health Data Hub. The challenge is sponsored by the "Grand Défi: Improvement of medical diagnoses through Artificial Intelligence" program led by the French Secretariat General for Investment.