Welcome to the Image Similarity Challenge! In this competition, you will be building models that help detect whether a given query image is derived from any of the images in a large reference set.

Content tracing is a crucial component on all social media platforms today, used for such tasks as flagging misinformation and manipulative advertising, preventing uploads of graphic violence, and enforcing copyright protections. But when dealing with the billions of new images generated every day on sites like Facebook, manual content moderation just doesn't scale. They depend on algorithms to help automatically flag or remove bad content.

This competition allows you to test your skills in building a key part of that content tracing system, and in so doing contribute to making social media more trustworthy and safe for the people who use it.


In this challenge you will build a model that detects whether a given query image is derived from a reference set.

There are two tracks to this challenge:

  • For the unconstrained Matching Track, your goal is to create a model that directly detects whether a query image is derived from one of the images in a large corpus of reference images.
  • For the constrained Descriptor Track (you are here!), your goal is to generate useful vector representations of images (up to 256 dimensions) for this task. These descriptors will be compared with Euclidean vector distance to detect whether a query image is derived from one of the images in a large corpus of reference images.

There are also two phases to this challenge:

  • Phase 1: Model Development (June - October 2021): Participants have access to the research dataset to develop and refine their models. Submissions may be made to the public leaderboard and evaluated for the Phase 1 leaderboard. These scores will not determine final leaderboard and rankings for prizes.

  • Phase 2: Final Scoring (October 26, 2021 00:00 UTC to October 27, 2021 23:59 UTC): Participants will have the opportunity to make up to three submissions against a new, unseen test set. Performance against this new test set will be used to determine prizes.

External data: Pre-trained models and external data are explicitly allowed in this competition as long as the participant has a valid license for use in accordance with the Competition Rules. Top-performing participants will be required to certify in writing that they have permission to use all external data used to develop their submissions, and may be required to provide documentation demonstrating such permission to the satisfaction of the competition sponsor.

Teaming: All teams must be formed by October 19, 2021 23:59:59 UTC, prior to the beginning of Phase 2. Teams formed in Phase 1 will be locked and must remain unchanged in order to participate in Phase 2.

How to compete

  1. Click the “Compete” button in the sidebar to enroll in the competition.
  2. Get familiar with the problem through the overview and problem description. You might also want to reference some of the additional resources from the about page.
  3. Download the data from the data tab.
  4. Create and train your own model. The "Getting Started" blog post and about page are good places to start.
  5. Use your model to generate predictions that match the submission format.
  6. Click “Submit” in the sidebar, and “Make new submission”. You’re in!

Oct. 27, 2021, 11:59 p.m. UTC

Competition End Date

Place Prize Amount (Phase 2)
1st $50,000
2nd $30,000
3rd $20,000

Matching Track

Predicted matching scores for pairs of query and reference images are evaluated using micro-average precision.

Place Prize Amount (Phase 2)
1st $50,000
2nd $30,000
3rd $20,000

Descriptor Track

Predicted vector representations of images (up to 256 dimensions) are compared with Euclidean vector distance to generate matching scores as in the Matching Track, which are then evaluated using micro-average precision.

Note: Prizes will be awarded to the teams that will share their solutions under an open source license per the Competition Rules. However, willingness to share solutions will not impact the leaderboard positions, so that top performers will still maintain their leaderboard rankings regardless of whether they receive a cash prize.

Prize generously supplied by Facebook AI.

The challenge is supported by Pinterest, BBC, Getty Images, iStock and Shutterstock.

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. The Competition consists of two (2) Phases, with winners determined based upon Submissions using the Phase II dataset. The start and end dates and times for each Phase will be set forth on this Competition Website. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes any area or country designated by the United States Treasury's Office of Foreign Assets Control (e.g. Cuba, Sudan, Crimea, Iran, North Korea, Syria, Venezuela). Any Participant use of External Data must be pursuant to a valid license. Void outside the Territory and where prohibited by law. Participation subject to official Competition Rules. Prizes: $50,000 USD (1st), $30,000 (2nd), $20,000 USD (3rd) for each of two tracks. See Official Rules and Competition Website for submission requirements, evaluation metrics and full details. Sponsor: Facebook, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.