Competition: TissueNet: Detect Lesions in Cervical Biopsies

Navigation

Sample of a cervical biopsy

The French Society of Pathology and the Health Data Hub are very enthusiastic about the data challenge results! They show that AI will sooner or later be part of the tools that pathologists use on a daily basis.

— Dr. Philippe Bertheau, President of the French Society of Pathology

Why

A biopsy is a sample of tissue examined at a microscopic level to diagnose cancer or signs of pre-cancer. Digital pathology has developed considerably over the past decade as it has become possible to work with digitized "whole slide images" (WSIs). These heavy image files contain all the information required to diagnose lesions as malignant or benign, yet present huge challenges to use effectively.

This challenge focused on epithelial lesions of the uterine cervix, and featured a unique collection of thousands expert-labeled WSIs collected from medical centers across France. This is a sizable dataset (700GB) of extremely high resolution images. Given the scale of the dataset, handling the data efficiently is a critical problem to solve in the process of developing an accurate approach to diagnosis.

The Solution

In this competition, participants were tasked with building machine learning models that could predict the most severe lesions in each digital biopsy slide. What's more, participants needed to submit code for executing their solution on test data in the cloud, ensuring that the model could run fast enough on this large scale data to be useful in practice. This setup rewards models that perform well on unseen images and brings these innovations one step closer to impact.

The Results

The winning solutions used clever approaches to prioritize the parts of each slide to analyze further, and built computer vision pipelines to determine the most appropriate diagnosis for the selected tissue. Models were scored not just on their accuracy, but also on the impact of their errors (providing a large penalty for mistakes that have worse consequences in practice).

The top-performing model achieved over 76% accuracy in predicting the exact severity label of each slide across 4 ranked classes, including 95% accuracy for the most severe class of cancerous tissue. In addition, the top 3 solutions achieved >98% on-or-adjacent accuracy, meaning they reduced the more costly misclassifications that erred by more than one class to less than 2% of the 1,500+ slide test set!

All prize-winning solutions are available under an open source license for ongoing use and learning.

alt-text