Navigation

Quick access

A side-by-side comparison of two videos showing a frame from a video on the left and the same frame manipulated with emojis on the right. _{Credit: BrentOzar}

Overview

Welcome to Phase 2 of the Meta AI Video Similarity Challenge! In this competition, you will build models that detect whether a query video contains a possibly manipulated clip from one or more videos in a reference set.

The ability to identify and track content on social media platforms, called content tracing, is crucial to the experience of users on these platforms. Previously, Meta AI and DrivenData hosted the Image Similarity Challenge, in which participants developed state-of-the-art models capable of accurately detecting when an image was derived from a known image. The motivation for detecting copies and manipulations with videos is similar—enforcing copyright protections, identifying misinformation, and removing violent or objectionable content.

Manual content moderation has challenges scaling to meet the large volume of content on platforms like Instagram and Facebook, where tens of thousands of hours of video are uploaded each day. Accurate and performant algorithms are critical in flagging and removing inappropriate content. This competition allows you to test your skills in building a key part of that content tracing system, and in so doing contribute to making social media more trustworthy and safe for the people who use it.

There are two tracks to this challenge:

For the Descriptor Track (you are here!), your goal is to generate useful vector representations of videos for this video similarity task. You will submit descriptors for both query and reference set videos. A standardized similarity search using pair-wise inner-product similarity will be used to generate ranked video match predictions.
For the Matching Track, your goal is to create a model that directly detects which specific clips of a query video correspond to which specific clips in one or more videos in a large corpus of reference videos. You will submit predictions indicating which portions of a query video are derived from a reference video.

There are also two phases to this challenge:

Phase 1: Model Development (Ended March 24, 2023 23:59 UTC): Participants have access to the research dataset to develop and refine their models. Submissions may be made to the public leaderboard and evaluated for the Phase 1 leaderboard. These scores will not determine final leaderboard and rankings for prizes.
Phase 2 (you are here!): Final Scoring (April 2, 2023 00:00 UTC to April 9, 2023 23:59 UTC): Participants will have the opportunity to make up to three submissions against a new, unseen test set. Performance against this new test set will be used to determine rankings for prizes.

If you or your team submitted a prize-eligible solution to Phase 1, you will be automatically added to the Phase 2 challenge. If you believe you submitted a prize-eligible solution in Phase 1 but were not added to Phase 2, please contact the DrivenData team via the forum or via info@drivendata.org.

External data: Pre-trained models and external data (except from YFCC100M) are allowed in this competition as long as the participant has a valid license for use in accordance with the Competition Rules. Top-performing participants will be required to certify in writing that they have permission to use all external data used to develop their submissions, and may be required to provide documentation demonstrating such permission to the satisfaction of the competition sponsor.

Teaming: All teams must have been formed by March 24, 2023 23:59 UTC, prior to the beginning of Phase 2. Teams will be locked after this deadline and must remain unchanged in order to participate in Phase 2.

The top teams will be invited to present their methodologies at CVPR!

This challenge will be featured at the Visual Copy Detection Workshop at CVPR 2023 in Vancouver, BC, Canada. The top three teams in each track will each be invited to present at this workshop on their approach to the competition!

End of Competition Phase 2:

April 9, 2023, 11:59 p.m. UTC

Place	Prize Amount (Phase 2)
1st	$25,000
2nd	$15,000
3rd	$10,000

Descriptor Track

Vector representations of videos (up to 512 dimensions and one vector per second of video) are compared with inner-product similarity to generate video match predictions, which are then evaluated using micro-average precision.

Place	Prize Amount (Phase 2)
1st	$25,000
2nd	$15,000
3rd	$10,000

Matching Track

Predicted matches for pairs of query and reference video segments are evaluated using average precision across operating points of recall and precision defined similarly from He et. al.

How to compete

You have been added to this competition by DrivenData because you participated in Phase 1 and have already agreed to the data license agreement and rules in Phase 1, all of which also apply in Phase 2.
Download the Phase 2 query dataset from the data tab.
Use your trained model(s) from your selected Phase 1 submissions to generate a set of descriptors for the Phase 2 query videos. You must not change or re-train your Phase 1 models - you may only use these models to generate new query set descriptors.
Package your generated Phase 2 query set descriptors, along with the reference set descriptors and inference code you submitted for the same submission(s) in Phase 1, based on the runtime repository specification on the Phase 2 submission.
Click Submissions on the sidebar followed by “Make new submission” to submit your descriptors and code as a zip archive for containerized execution.
We will generate your rank-ordered pairwise predictions from your submitted descriptors using an exhaustive similarity search on the platform, and use your code to generate query descriptors on a subset of the validation set to evaluate resource usage. Your code for Phase 2 will be given a small additional amount of code execution time to account for any minor variation in runtime.

Prize generously supplied by Meta AI.

NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. The Competition consists of two (2) Phases, with winners determined based upon Submissions using the Phase II dataset. The start and end dates and times for each Phase will be set forth on this Competition Website. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes any area or country designated by the United States Treasury's Office of Foreign Assets Control (e.g. Crimea, Donetsk, and Luhansk regions of Ukraine, Cuba, North Korea, Iran, Syria), Russia and Belarus. Any Participant use of External Data must be pursuant to a valid license. Void outside the Territory and where prohibited by law. Participation subject to official Competition Rules. Prizes: $25,000 USD (1st), $15,000 (2nd), $10,000 USD (3rd) for each of two tracks. See Official Rules and Competition Website for submission requirements, evaluation metrics and full details. Sponsor: Meta Platforms, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.

Meta AI Video Similarity Challenge: Descriptor Track | Phase 2

Quick Facts

Participants

No. of Entries

Prize

Winner

do something