Competition: PHASE 1 | Facebook AI Image Similarity Challenge: Matching Track

Navigation

Quick access

The methods developed by the contestants are of high quality and set a new standard in research and for the industry in the field of image copy detection.
— Matthijs Douze, Facebook AI Research Scientist and Image Similarity Challenge author

Why

Copy detection is a crucial component on all social media platforms today, used for such tasks as flagging misinformation and manipulative advertising, preventing uploads of graphic violence, and enforcing copyright protections. But when dealing with the billions of new images generated every day on sites like Facebook, manual content moderation just doesn't scale. We need algorithms to help automatically flag or remove bad content.

This competition allowed participants to test their skills in building a key part of that content moderating system, and in so doing contribute to making social media more trustworthy and safe for the people who use it. For more information, check out the competition paper from Facebook AI.

The Solution

In this challenge, participants had access to 3 archives of competition images.

1 million reference images
50K query images, a subset of which were derived from the reference images
1 million training images, statistically similar to but distinct from the reference archive

The core task in the Matching Track was to determine for each query image whether it originated from one of the reference images and assign a confidence score indicating its similarity to the candidate reference image. The end goal was similar for the Descriptor Track, but in this case participants submitted the image embeddings for all query and reference images, with a similarity search and submission score computed automatically on the competition platform.

The Results

Between June and October 2021, 1,236 participants from 80 countries signed up to solve the problems posed by the two tracks. One goal of the competition sponsors at Facebook AI was to create an opportunity for participants to explore self-supervised learning (SSL) techniques, which turned out to be a key component across all of the winning solutions. Ultimately, the winning solutions vastly outperformed the competition baseline methods, achieving micro average precision scores of 0.8329 and 0.6354 on the Matching and Descriptor tracks, respectively.

RMSE graph

All the prize-winning solutions from this competition have been released under an open source license, along with academic write-ups on arxiv. The dataset is also available for ongoing practice and learning.

RESULTS ANNOUNCEMENT + MEET THE WINNERS

WINNING MODELS ON GITHUB

DISC21 DATASET