Where's Whale-do?

Help the Bureau of Ocean Energy Management (BOEM), NOAA Fisheries, and Wild Me accurately identify endangered Cook Inlet beluga whales from photographic imagery. Scalable photo-identification of individuals is critical to population assessment, management, and protection for these endangered whales. #climate

$35,000 in prizes
jun 2022
442 joined

Code Submission Format


This is a code submission challenge!

In a typical competition, you would craft your algorithms and generate predictions for the test dataset on your local machine. Then, you would submit the predictions to the competition for scoring.

For this competition, you will instead package the files needed to perform inference and submit those for containerized execution. The runtime repository contains the complete specification for the execution runtime.

If you've never participated in a code execution challenge before, don't be intimidated! We make it as easy as possible for you.

Contents

What you submit


Your job will be to submit a zip archive named with the extension .zip (for example, submission.zip). The root level of the submission.zip file must contain a main.py.

During code execution, your submission will be unzipped and our compute cluster will run main.py to perform inference on the test set queries.

Here's a simple boilerplate main.py which shows how the script can iterate through all the test set queries and generate image rankings for each query:

from pathlib import Path
import pandas as pd


ROOT_DIRECTORY = Path("/code_execution")
DATA_DIRECTORY = ROOT_DIRECTORY / "data"
OUTPUT_FILE = ROOT_DIRECTORY / "submission/submission.csv"


def main():
    scenarios_df = pd.read_csv("/code_execution/query_scenarios.csv")
    metadata_df = pd.read_csv("/code_execution/metadata.csv")

    predictions = []

    for scenario_row in scenarios_df.itertuples():

        queries_df = pd.read_csv(DATA_DIRECTORY / scenario_row.queries_path)
        database_df = pd.read_csv(DATA_DIRECTORY / scenario_row.database_path)

        for query_row in queries_df.itertuples():
            query_id = query_row.query_id
            query_image_id = query_row.query_image_id
            database_image_ids = database_df["database_image_id"].values

            result_images, scores = predict(           #  <-- You should define this
                query_image_id, database_image_ids
            )

            for pred_image_id, score in zip(result_images, scores):
                predictions.append(
                    {
                        "query_id": query_id,
                        "database_image_id": pred_image_id,
                        "score": score,
                    }
                )

    predictions_df = pd.DataFrame(predictions)
    predictions_df.to_csv(OUTPUT_FILE, index=False)

if __name__ == "__main__":
    main()

Do you see the undefined predict function above? That's where you come in! In the following sections, we'll go into more detail about each of the files being loaded, and what the code is looping over.

This example is very simple, and is the structure followed by our "quickstart" example code submission that simply returns the first 20 images in each database. You may also choose to perform computations earlier—our slightly-more-complex deep learning model example precomputes embeddings for all images and then looks them up later. You can find both the quickstart and deep learning example submissions in the runtime repository. Your job will be to take the examples we've provided and adapt them to use your model to rank images of beluga whales.

Container access to data


You're going to have access to all the competition data you need in /code_execution/data directory mounted onto the container with the following contents. This directory structure is replicated with example data in the runtime repository so that as you're developing your solution, you'll be using the same structure that gets used on the cloud during code execution.

/code_execution/                  # Runtime working directory
├── data/                         # All required competition data is mounted here
│   ├── databases/                # Directory containing the database image IDs for
│   │      │                      #   each scenario
│   │      ├── scenario01.csv
│   │      ├── scenario02.csv
│   │      ├── scenario03.csv
│   │      └── ...
│   ├── images/                   # Directory containing all the images
│   │      ├── test0001.jpg
│   │      ├── test0002.jpg
│   │      ├── test0003.jpg
│   │      └── ...
│   ├── queries/                  # Directory containing the query image IDs for each
│   │                                 scenario
│   │      ├── scenario01.csv
│   │      ├── scenario02.csv
│   │      ├── scenario03.csv
│   │      └── ...
│   ├── metadata.csv              # CSV file with image metadata (image dimensions,
│   │                             #   viewpoint, date)
│   └── query_scenarios.csv       # CSV file that lists all test scenarios with paths
│                                 #   to associated query and database definitions
├── main.py                       # Your submitted entrypoint script that will be
│                                 #   executed by the container
└── submission/                   # Directory where your code submission zip will be
                                  #   extracted to

When running the container locally, /code_execution/data/ is a mounted version of what you have in the repository locally in data/. In the official code execution platform, /code_execution/data/ will contain the real test data.

Your code will not have network access so you should also package up any necessary resources. Your function may load model artifacts, call into other Python files, and use other resources you have packaged into the zipped submission.

Procedure for test inference


The core task you will be evaluated on in this competition requires you to generate image rankings for each query in each scenario. This will be done during code execution using the main.py script, which iterates through each scenario and runs inference to generate your image rankings. In this section, we'll walk through each of the files involved in that. The example data tables shown below are the provided example data that is constructed using the training set—the real test data files are withheld from you, but they will have the same format.

Scenarios, queries, and databases


The starting point is the query_scenarios.csv file. This file will have one row for each test scenario, along with paths to the files that specify that scenario's queries and database. These paths are all relative to /code_execution/data/. You will iterate through the rows of this file to evaluate each scenario.

scenario_id queries_path database_path
scenario01 queries/scenario01.csv databases/scenario01.csv
scenario02 queries/scenario02.csv databases/scenario02.csv


Each scenario will specify a path to a queries/scenario##.csv file. Each row in this file is one query, identified by its query_id. In your final output, you will use the values in query_id to identify which queries which image rankings correspond to. The query image for that query is specified by the query_image_id column.

query_id query_image_id
scenario01-train2893 train2893
scenario01-train0829 train0829
scenario01-train2183 train2183
scenario01-train2183 train2183
... ...


All queries in a given scenario should be queried against that scenario's single database, specified by the databases/scenario##.csv file. This file contains one column database_image_id that is a list of all images that make up this database.

database_image_id
train0010
train0019
train0072
...


Note that some test scenarios have been constructed in a leave-one-out manner, meaning that the query set for that scenario is a subset of the scenario's database. The correct procedure for such scenarios is to perform inference for each query image against the database excluding that query image. In order to facilitate caching, you are allowed to derive intermediate data structures for the entire database for that scenario (which will end up including the query image of a given query), provided that you exclude the query image itself in your returned predictions. Otherwise, your submission will result in a validation error upon final scoring. In the example scenarios, scenario01 is constructed this way, while scenario02 is not.

Images and metadata

Now that you have loaded the specifications for a scenario's queries and database, it's time to run inference. The values of query_image_id and database_image_id each specify individual images. You will find all images for all scenarios listed in the metadata.csv file. For more about the metadata itself, see the relevant section on the Problem Description page.

image_id path height width viewpoint date
train0000 images/train0000.jpg 463 150 top 2017-08-07
train0001 images/train0001.jpg 192 81 top 2019-08-05
train0002 images/train0002.jpg 625 183 top 2017-08-07
... ... ... ... ... ...


The image files for all scenarios are all located in /code_execution/data/images/ directory. You can load an image either by following the naming pattern <image_id>.jpg, or by using the path column in metadata.csv, which will be relative to /code_execution/data/.

Predictions format

Your main.py script should produce a predictions file submission/submission.csv. This file should be the image rankings for all test queries concatenated into a single long format, shown below. The query_id value should match the identifier from the queries/scenario##.csv file, and the database_image_id should be the image_id of the image you are returning for that query. The score should be a confidence score that is a floating point number in the range [0.0, 1.0].

query_id database_image_id score
scenario01-train2893 train0010 0.5
scenario01-train2893 train0019 0.5
scenario01-train2893 train0072 0.5
... ... ...
scenario01-train0829 train0010 0.5
scenario01-train0829 train0019 0.5
scenario01-train0829 train0072 0.5
... ... ...


Scoring script and ground truth format

We provide a scoring script which calculates the challenge metric that you can use locally to evaluate your predictions. This scoring script, along with ground truth data for the example scenarios, can be found in scoring/ directory of the runtime repository. See the repository README for usage instructions.

When creating your own query scenarios, in order to use the scoring script to evaluate your performance, your ground truth should follow the below format. Each row in this file should be one database image that is a correct match for the query (i.e., is of the same individual whale as that query's query image). The query_id should match that from the queries/scenario##.csv file and from your predictions, and the database_image_id value should be the image_id of the database image. All correct matches across all queries and all scenarios should be concatenated in a long-format as shown.

query_id database_image_id
scenario01-train2893 train0588
scenario01-train2893 train0721
scenario01-train2893 train0970
... ...
scenario01-train0829 train0296
scenario01-train0829 train0362
scenario01-train0829 train0382
... ...


Caching

Due to the nature of how we construct test set scenarios, the same images, image comparisons, and databases will often be processed multiple times over the whole test set evaluation. In light of this, we allow for solutions to use in-memory or on-disk caching to speed up evaluation. Note that caching will only work within one submission's job—each submission's job is processed independently, and there is not any persistence of state across jobs.

Any use of caching must follow competition rules. Of particular relevance is the requirement that all queries' predictions should be independent—the image rankings produced for a given query should only use information from that query image and the associated database, and should not change based on the existence or non-existence of other images or other queries involving the same images. This means that cached information that is reused between queries is compliant if and only if identical information would have been computed if the cache didn't already exist.

With that in mind, here are a few tips to help you think about implementing caching:

  • Each image in the test set has a unique image_id. You can reuse features computed for an image, or reuse the results of a comparison of a pair of images.
  • All queries for one scenario are made against the same database. If your solution involves the creation of a search data structure, you can use reuse that data structure for all of those queries. Note again that some scenarios are constructed such that the query images are a subset of the database—you can still reuse the one data structure for that scenario but you should explicitly exclude the query image itself from the returned results.

Runtime specs


Your code is executed within a container that is defined in our runtime repository. The limits are as follows:

  • Your submission must be written in Python (Python 3.9.7) and use the packages defined in the runtime repository.
  • Your code may not read and inspect the files in /data directly. Doing so is grounds for disqualification. Instead, you will implement a script that passes the data into your model for inference. Using I/O or global variables to pass information between calls, printing out data to the logs, or other attempts to circumvent the setup of this prediction challenge are grounds for disqualification. If in doubt whether something like this is okay, you may email us or post on the forum.
  • The submission must complete execution in 3 hours or less.
  • The container has access to 6 vCPUs, 56 GB RAM, and 1 GPU with 12 GB of memory.
  • The container will not have network access. All necessary files (code and model assets) must be included in your submission.
  • The container execution will not have root access to the filesystem.

Requesting packages


Since the Docker container will not have network access, all packages must be pre-installed. We are happy to consider additional packages as long as they are approved by the challenge organizers, do not conflict with each other, and can build successfully. Packages must be available through conda for Python 3.9.7. To request an additional package be added to the docker image, follow the instructions in the runtime repository README.

Happy building! Once again, if you have any questions or issues you can always head over to the user forum!