Meta AI Video Similarity Challenge: Matching Track | Phase 1

Help keep social media safe by identifying whether a video contains a manipulated clip from one or more videos in a reference set. #society

mar 2023
213 joined

Code Submission Format


This is a hybrid code execution challenge. If you've never participated in a code execution challenge before, don't be intimidated! We make it as easy as possible for you.

You will be submitting your predictions in a full_matches.csv file, as well as a script called main.py that will generate predictions from a subset of test set queries. Here's what your submission will look like at a high level:

submission.zip                    # this is what you submit
├── full_matches.csv              # CSV file containing your predictions
├── main.py                       # your script that will generate predictions  
│                                 #   for a subset of test set query videos
└── model_assets/                 # any assets required by main.py script  
                                  #   like a model checkpoint

Your leaderboard score will be computed using your full_matches.csv predictions, and our compute cluster will also run your main.py to ensure that your submission runs successfully within the allotted time of 10 seconds per query. A score will be generated for the performance of the code-generated predictions on the test subset, but this score will not determine your leaderboard ranking.

Contents

What you submit


Your job will be to submit a zip archive named with the extension .zip (e.g. submission.zip). The root level of the submission.zip file must contain your full_matches.csv predictions and your main.py script.

full_matches.csv

This is a CSV file containing your predicted matches between query and reference videos, including timestamps for the overlapping intervals and a score for each match.

See the Submission Format page for a full explanation of what this should look like. You must generate this file locally and package it in your submission.zip; it is not generated during code execution.

main.py script

Our compute cluster will execute your main.py script to measure computational costs and performance on the test subset. Your main.py script just needs to write out predictions for a subset of the test data in the same format as the full_matches.csv file you are already submitting.

Here is a pseudocode version of main.py which shows how the script can iterate through the subset of test queries and write out a new predictions file:

from pathlib import Path
import pandas as pd
import numpy as np

ROOT_DIRECTORY = Path("/code_execution")
DATA_DIRECTORY = Path("/data")
OUTPUT_FILE = ROOT_DIRECTORY / "submission" / "subset_matches.csv"


def generate_matches(query_video_ids) -> pd.DataFrame:
    raise NotImplementedError(
        "This script is just a template. You should adapt it with your own code."
    )
    matches = ...
    return matches


def main():
    # Loading subset of query images
    query_subset = pd.read_csv(DATA_DIRECTORY / "query_subset.csv")
    query_subset_video_ids = query_subset.video_id.values

    # Generation of query matches happens here #
    matches = generate_matches(query_subset_video_ids)

    matches.to_csv(OUTPUT_FILE, index=False)


if __name__ == "__main__":
    main()

You will be responsible for implementing a version of this script that generates predictions using your own model assets. Model assets, such as a checkpoint file, can be included in your .zip submission.

Container access to data


You're going to have access to all the competition data you need in the /data directory mounted onto the container. This directory structure is replicated with example data in the runtime repository so that as you're developing your solution, you'll be using the same structure that gets used on the cloud during code execution.

/                                 # Root directory
├── data/                         # All required competition data is mounted here
│   ├── query_metadata.csv        # Full test set query metadata 
│   ├── query_subset.csv          # Single-column CSV containing query ids 
│   │                             #   contained in the inference subset
│   ├── reference_metadata.csv    # Full test set reference metadata
│   └── query/                    # Directory containing a subset of test set query
│       ├── Q100005.mp4           #  videos
│       ├── Q100301.mp4
│       ├── Q107382.mp4
│       └── ...
└── code_execution/               # Directory where your code submission zip will be
    │                             #   extracted to
    ├── main.py                   # Your submitted main.py script
    └── ...                       # the remaining contents of your submission.zip

Differences between local and runtime

When running the container locally, /data is a mounted version of whatever you have in the repository locally in /data. This is also the same place your code will access data when it runs in the code execution environment.

Your code will not have network access so you should also package up any necessary resources. Your function may load model artifacts, call into other Python files, and use other resources you have packaged into the zipped submission.

Runtime specs


Your code is executed within a container that is defined in our runtime repository. The limits are as follows:

  • Your submission must be written in Python (Python 3.9.13) and use the packages defined in the runtime repository.
  • Your code may read the files in /data, but may not log any data about these files to the console. Doing so is grounds for disqualification. Using I/O or global variables to pass information between calls, printing out data to the logs, or other attempts to circumvent the setup of this challenge are grounds for disqualification. If in doubt whether something like this is okay, you may email us or post on the forum.
  • Your code may not reference or use data from your submitted full_matches.csv
  • The submission must complete execution in less than 140 minutes (ten seconds per query for ~800 query videos plus overhead)
  • The container has access to 6 vCPUs, 112 GiB RAM, 736 GiB disk storage, and 1 NVIDIA Tesla V100 GPU with 16 GB of memory.
  • The container will not have network access. All necessary files (code and model assets) must be included in your submission.
  • The container execution will not have root access to the filesystem.

Requesting packages


Since the Docker container will not have network access, all packages must be pre-installed. We are happy to consider additional packages as long as they are approved by the challenge organizers, do not conflict with each other, and can build successfully. Packages must be available through conda for Python 3.9.13. To request an additional package be added to the docker image, follow the instructions in the runtime repository README.

Happy building! Once again, if you have any questions or issues you can always head over to the user forum!