Meta AI Video Similarity Challenge: Matching Track | Phase 2

Help keep social media safe by identifying whether a video contains a manipulated clip from one or more videos in a reference set. #society

$50,000 in prizes
apr 2023
18 joined

Phase 2 Submission

For the Phase 2 Matching Track, your task is to generate predicted matches for each video in a new, unseen set of query videos. Matches consist of query-reference video pairs, timestamps that correspond to the start and end of the derived content in the query and reference videos, and a confidence score. The reference set of videos for Phase 2 is the same set of reference videos as the Phase 1 test reference dataset, and will hereon simply be referred to as the reference set. The metric for this task is micro-average precision.

This page contains the information you need to obtain Phase 2 data and successfully submit your Phase 2 submissions. For more detailed information, please reference the Problem Description and Code Submission Format pages from Phase 1. Note that the rules on data use from Phase 1 still apply to Phase 2, since the models you use in Phase 2 must be the same models that you trained in Phase 1.

Table of contents

Phase 2 Dataset

The Phase 2 query corpus consists of 8,015 query videos, with filenames going from Q300001.mp4 to Q308015.mp4. These new query videos may or may not contain content derived from the reference set of videos.

To access the Phase 2 data, download the data following the instructions in the data tab. The corpus is large, so this might take a little while. Make sure you have a stable internet connection.

You will be able to download the following:

  • All of the data from Phase 1
  • The Phase 2 query test dataset, containing:
    • query corpus containing 8,015 query video mp4 files. Filenames correspond to each query video id, going from Q300001.mp4 to Q308015.mp4
    • test_query_metadata.csv which contains metadata for the Phase 2 query videos
    • test_reference_metadata.csv which contains metadata for the set of reference videos (identical to the Phase 1 test reference set metadata file)

Eligible Phase 2 Submissions


Prior to the end of Phase 1, you were prompted to submit information about the models you would be using in Phase 2. In order to be eligible for final prizes in Phase 2, the models and code you submit in Phase 2 must be identical to the models and code from your selected Phase 1 submissions. As noted above, you will not be allowed to re-train your Phase 1 model when participating in Phase 2. You will simply be applying your existing Phase 1 model onto a new data set. It is your responsibility to ensure that your Phase 2 submissions contain identical models and code to those you have chosen to move forward from Phase 1. If any changes must be made for your code to successfully execute in the code execution environment, please document them. DrivenData has the discretion to determine whether such changes are permissible.

As in Phase 1, Phase 2 submissions will be required to include models and code to perform inference on a subset of query videos. This subset will contain the same number of videos as the analogous subset from Phase 1. Submissions are expected to meet the same 10-seconds-per-query-video average runtime constraint. The overall time limit will include a small margin to allow for minor unexpected variability in runtime performance, but you should plan to have your solution meet the same constraint for the Phase 2 test set.

Submission Format


Just like Phase 1, Phase 2 is structured as a hybrid code execution challenge! In addition to submitting your new predicted matches for the Phase 2 query videos against the Phase 1 test reference videos, you'll package everything needed to do inference and submit that for containerized execution on a subset of the Phase 2 query set.

Your Phase 2 submission should have an identical structure as your Phase 1 submission and mostly contain identical files. Only your full_matches.csv file should be different.

submission.zip                    # this is what you submit
├── full_matches.csv              # csv file containing Phase 2 query set matches
├── main.py                       # your script that will generate matches for
│                                 #   a subset of test set query videos
└── model_assets/                 # any assets required by main.py script, like a
                                  #   model checkpoint

Your leaderboard score will be computed using the submitted full_matches.csv file. For more detail on submission structure, see the Phase 1 Code Submission Format page.

Performance Metric


Just as in Phase 1, submissions will be evaluated by a segment-matching version of micro-average precision. See the Phase 1 description for more detail.

Good luck!


Good luck and enjoy this problem! If you have any questions you can always visit the user forum!


NO PURCHASE NECESSARY TO ENTER/WIN. A PURCHASE WILL NOT INCREASE YOUR CHANCES OF WINNING. The Competition consists of two (2) Phases, with winners determined based upon Submissions using the Phase II dataset. The start and end dates and times for each Phase will be set forth on this Competition Website. Open to legal residents of the Territory, 18+ & age of majority. "Territory" means any country, state, or province where the laws of the US or local law do not prohibit participating or receiving a prize in the Challenge and excludes any area or country designated by the United States Treasury's Office of Foreign Assets Control (e.g. Crimea, Donetsk, and Luhansk regions of Ukraine, Cuba, North Korea, Iran, Syria), Russia and Belarus. Any Participant use of External Data must be pursuant to a valid license. Void outside the Territory and where prohibited by law. Participation subject to official Competition Rules. Prizes: $25,000 USD (1st), $15,000 (2nd), $10,000 USD (3rd) for each of two tracks. See Official Rules and Competition Website for submission requirements, evaluation metrics and full details. Sponsor: Meta Platforms, Inc., 1 Hacker Way, Menlo Park, CA 94025 USA.