Meta AI Video Similarity Challenge: Descriptor Track (Open Arena)

Help keep social media safe by identifying whether a video contains a manipulated clip from one or more videos in a reference set. #society

advanced practice
dec 2023
38 joined

A side-by-side comparison of two videos showing a frame from a video on the left and the same frame manipulated with emojis on the right. Credit: BrentOzar

Video Similarity Challenge Open Arena

Welcome to the open arena for the Descriptor Track of the Meta AI Video Similarity Challenge! The prize-awarding portion of this competition took place between December 2022 and April 2023. You can read about the winners of the competition here and see the original results here. This version of the competition does not have any prizes — participation is just for your own learning and enjoyment!

The ability to identify and track content on social media platforms, called content tracing, is crucial to the experience of users on these platforms. Previously, Meta AI and DrivenData hosted the Image Similarity Challenge, in which participants developed state-of-the-art models capable of accurately detecting when an image was derived from a known image. The motivation for detecting copies and manipulations with videos is similar—enforcing copyright protections, identifying misinformation, and removing violent or objectionable content.

Manual content moderation has challenges scaling to meet the large volume of content on platforms like Instagram and Facebook, where tens of thousands of hours of video are uploaded each day. Accurate and performant algorithms are critical in flagging and removing inappropriate content. This competition allows you to test your skills in building a key part of that content tracing system, and in so doing contribute to making social media more trustworthy and safe for the people who use it.

There are two tracks to this challenge:

  • For the Descriptor Track (you are here!), your goal is to generate useful vector representations of videos for this video similarity task. You will generate descriptors for both query and reference set videos and then execute a standardized similarity search using pair-wise inner-product similarity to generate pairwise video match predictions.
  • For the Matching Track, your goal is to create a model that directly detects which specific clips of a query video correspond to which specific clips in one or more videos in a large corpus of reference videos. You will submit predictions indicating which portions of a query video are derived from a reference video.

How to explore the arena

  1. Click the “Compete” button on the sidebar to register for the competition.
  2. Get familiar with the problem on the problem description page. You might also want to reference additional resources available on the about page and the user forum.
  3. Download the data from the data tab (you will need to be registered).
  4. Create and train your own model to generate descriptors for query and reference videos.
  5. Generate your predicted matching pairs with your generated descriptors using the provided descriptor evaluation script, ensuring your generated submission adheres to the submission format guidelines.
  6. Click Submissions on the sidebar followed by “Make new submission” to make your submission.