Meta AI Video Similarity Challenge: Descriptor Track | Phase 1 Hosted By Meta


Code Submission Format

This is a hybrid code execution challenge. If you've never participated in a code execution challenge before, don't be intimidated! We make it as easy as possible for you.

You will be submitting your descriptor files, along with a script for generating descriptors from a subset of test set queries. Here's what your submission will look like at a high level:                    # this is what you submit
├── query_descriptors.npz         # npz file containing query set descriptors
├── reference_descriptors.npz     # npz file containing reference set descriptors
├──                       # your script that will generate descriptors for
│                                 #   a subset of test set query videos
└── model_assets/                 # any assets required by script, like a
                                  #   model checkpoint

Your leaderboard score will be computed using the descriptor .npz files, and our compute cluster will also run your to ensure that your submission runs successfully within the allotted time of 10 seconds per query. A score will be generated for the performance of the code-generated descriptors for the test subset, but this score will not determine your leaderboard ranking.


What you submit

Your job will be to submit a zip archive named with the extension .zip (e.g. The root level of the file must contain your descriptors files and your script.

Descriptors .npz files

These files will contain descriptors for the ~8,000 query videos and ~40,000 reference videos in the test set (up to one descriptor for every second of video and a maximum of 512 dimensions). The query and reference descriptors should be submitted as npz files and formatted to contain three top-level variables as follows:

  • video_ids is a list of identifiers that provides the video id to which each descriptor vector corresponds, e.g., "Q100001" (do not include the .mp4 extension). These can be either a string or an integer (version without the Q or R prefix). Your video ids must be in sorted order.
    • Note that if you are generating this array from a pandas dataframe, you may end up with the object datatype. However, object datatype arrays will result in the following error: ValueError: Object arrays cannot be loaded when allow_pickle=False. If you encounter this error, you should convert your array to strings. See this StackOverflow post for more information.
  • timestamps is a 1D or 2D array of timestamps indicating the start and end times in seconds that the descriptor describes. In the case of a 1D array, your timestamps will be treated as a range with the same start and end timestamp. These are not used in descriptor track scoring, since your predictions for this track are based on the maximum similarity across all descriptors in a query-reference pair, but are useful for diagnostics and model interpretability.
  • features is a 32-bit float ndarray of descriptor embeddings for the corresponding video_id, up to one descriptor per second of video and with maximum descriptor dimensionality of 512. The features in your features array should properly correspond to the sorted video_ids array so that they are appropriately matched.

Note: As mentioned in the problem description, the limitation of one descriptor per second of video is a global limitation - that is, the total number of submitted or generated descriptors must be less than the total number of seconds of video in the test set or the code execution test subset. Participants can, if they so choose, distribute their descriptors among videos in a set in such a way that violates this “one frame per second of video per video” constraint, provided that the number of descriptors is still below the global threshold. For more information on how you may accomplish this, please see the rules on data use on the problem description page.

Here is an example of how you might create your descriptors and write them out as a .npz file using numpy.savez:

import numpy as np

qry_video_ids = [20000, 20001, ..., 29998, 29999]  # Can also be str: "Q20000", ...
qry_timestamps = [[0.0, 1.1], [1.1, 2.2], ..., [52.9, 54.9], [51.1, 58.4]]
qry_descriptors = np.array(
        [0.2343, -0.8415, ..., 1.3961, -1.3243],
        [-1.5233, 0.1302, ..., -0.8566, 0.0243],
        [1.4251, 0.1345, ..., 0.7582, -1.7841],
        [0.8537, 0.4745, ..., 0.1689, 1.3798],


Unlike with the Matching Track, you do not need to submit ranking scores for query-reference pairs. We will take care of that on our end using just the descriptors.

You can find more example code for generating these files in the code execution runtime container repository. script

Our compute cluster will also run to measure computational costs and performance on the test subset. Your script just needs to write out a subset of query descriptors to the same format as the .npz files you are already submitting.

Here's a simple boilerplate which shows how the script can iterate through the subset of test queries and write out a new descriptors file:

from pathlib import Path
import pandas as pd
import numpy as np

ROOT_DIRECTORY = Path("/code_execution")
DATA_DIRECTORY = Path("/data")
OUTPUT_FILE = ROOT_DIRECTORY / "subset_query_descriptors.npz"

def generate_query_descriptors(query_video_ids) -> np.ndarray:
    raise NotImplementedError(
        "This script is just a template. You should adapt it with your own code."
    video_ids = ...
    descriptors = ...
    timestamp_intervals = ...
    return video_ids, descriptors, timestamp_intervals

def main():
    # Loading subset of query images
    query_subset = pd.read_csv(QUERY_SUBSET_FILE)
    query_subset_video_ids = query_subset.video_id.values.astype("U")

    # Generation of query descriptors happens here
    query_video_ids, query_descriptors, query_timestamps = generate_query_descriptors(


if __name__ == "__main__":

You will be responsible for implementing a version of the generate_query_descriptors function on your own, using your own model assets. Model assets, such as a checkpoint file, can be included in your .zip submission.

Container access to data

You're going to have access to all the competition data you need in the /data directory mounted onto the container with the following contents. This directory structure is replicated with example data in the runtime repository so that as you're developing your solution, you'll be using the same structure that gets used on the cloud during code execution.

/                                 # Root directory
├── data/                         # All required competition data is mounted here
│   ├── query_metadata.csv        # Full test set query metadata 
│   ├── query_subset.csv          # Single-column CSV containing query ids 
│   │                             #   contained in the inference subset
│   ├── reference_metadata.csv    # Full test set reference metadata
│   └── query/                    # Directory containing a subset of test set query
│       ├── Q100005.mp4           #  videos
│       ├── Q100301.mp4
│       ├── Q107382.mp4
│       └── ...
└── code_execution/               # Directory where your code submission zip will be
    │                             #   extracted to
    ├──                   # Your submitted script
    └── ...                       # the remaining contents of your

Differences between local and runtime

When running the container locally, /data is a mounted version of whatever you have in the repository locally in /data. This is also the same place your code will access data when it runs in the code execution environment.

Your code will not have network access so you should also package up any necessary resources. Your function may load model artifacts, call into other Python files, and use other resources you have packaged into the zipped submission.

Runtime specs

Your code is executed within a container that is defined in our runtime repository. The limits are as follows:

  • Your submission must be written in Python (Python 3.9.13) and use the packages defined in the runtime repository.
  • Your code may read the files in /data, but may not log any information about them to the console or otherwise relay information about the contents of /data to you. Doing so is grounds for disqualification. Using I/O or global variables to pass information between calls, printing out data to the logs, or other attempts to circumvent the setup of this challenge are grounds for disqualification. If in doubt whether something like this is okay, you may email us or post on the forum.
  • Your code may not reference or use data from your submitted query_descriptors.npz
  • The submission must complete execution in less than 150 minutes (ten seconds per query for ~800 query videos plus overhead for the similarity search)
  • The container has access to 6 vCPUs, 112 GiB RAM, 736 GiB disk storage, and 1 NVIDIA Tesla V100 GPU with 16 GB of memory.
  • The container will not have network access. All necessary files (code and model assets) must be included in your submission.
  • The container execution will not have root access to the filesystem.

Requesting packages

Since the Docker container will not have network access, all packages must be pre-installed. We are happy to consider additional packages as long as they are approved by the challenge organizers, do not conflict with each other, and can build successfully. Packages must be available through conda for Python 3.9.13. To request an additional package be added to the docker image, follow the instructions in the runtime repository README.

Happy building! Once again, if you have any questions or issues you can always head over to the user forum!