Meta AI Video Similarity Challenge

A side-by-side comparison of two videos showing a frame from a video on the left and the same frame manipulated with emojis on the right. Credit: BrentOzar


Welcome to the Video Similarity Challenge! In this competition, you will build models that detect whether a query video contains a possibly manipulated clip from one or more videos in a reference set.

The ability to identify and track content on social media platforms, called content tracing, is crucial to the experience of users on these platforms. Previously, Meta AI and DrivenData hosted the Image Similarity Challenge, in which participants developed state-of-the-art models capable of accurately detecting when an image was derived from a known image. The motivation for detecting copies and manipulations with videos is similar—enforcing copyright protections, identifying misinformation, and removing violent or objectionable content.

Manual content moderation has challenges scaling to meet the large volume of content on platforms like Instagram and Facebook, where tens of thousands of hours of video are uploaded each day. Accurate and performant algorithms are critical in flagging and removing inappropriate content. This competition allows you to test your skills in building a key part of that content tracing system, and in so doing contribute to making social media more trustworthy and safe for the people who use it.

There are two tracks to this challenge:

  • For the Descriptor Track, your goal is to generate useful vector representations of videos for this video similarity task. You will submit descriptors for both query and reference set videos. A standardized similarity search using pair-wise inner-product similarity will be used to generate ranked video match predictions.
  • For the Matching Track, your goal is to create a model that directly detects which specific clips of a query video correspond to which specific clips in one or more videos in a large corpus of reference videos. You will submit predictions indicating which portions of a query video are derived from a reference video.

There are also two phases to this challenge:

  • Phase 1: Model Development (December 2, 2022 00:00 UTC - March 24, 2023 23:59 UTC): Participants have access to the research dataset to develop and refine their models. Submissions may be made to the public leaderboard and evaluated for the Phase 1 leaderboard. These scores will not determine final leaderboard and rankings for prizes.

  • Phase 2: Final Scoring (April 2, 2023 00:00 UTC to April 9, 2023 23:59 UTC): Participants will have the opportunity to make up to three submissions against a new, unseen test set. Performance against this new test set will be used to determine rankings for prizes.

The top teams will be invited to present their methodologies at CVPR!

This challenge will be featured at the Visual Copy Detection Workshop at CVPR 2023 in Vancouver, BC, Canada. The top three teams in each track will each be invited to present at this workshop on their approach to the competition!

Click on a track below and sign up to get started!

Place Prize Amount (Phase 2)
1st $25,000
2nd $15,000
3rd $10,000

Descriptor Track

Vector representations of videos (up to 512 dimensions and one vector per second of video) are compared with inner-product similarity to generate video match predictions, which are then evaluated using micro-average precision.

Place Prize Amount (Phase 2)
1st $25,000
2nd $15,000
3rd $10,000

Matching Track

Predicted matches for pairs of query and reference video segments are evaluated using average precision across operating points of recall and precision defined similarly from He et. al.

Prize generously supplied by Meta AI.