Pri-matrix Factorization

Data scientists from more than 90 countries around the world drew on 300,000 video clips in a competition to build the best machine learning models for identifying wildlife from camera trap footage. The results are powerful and – equally important … #climate

€20,000 in prizes
dec 2017
321 joined

Problem description

Your model should identify the animals (or lack thereof) in a given Chimp&See video. There are 24 categories in total: 23 animal categories plus 1 category corresponding to no animal. Each video is identified by a 10 character alphanumeric string followed by a .mp4, e.g., abcde12345.mp4. This index is referred to as the video's filename. Given a video file and it's filename as input, your trained model should output a list of 24 probabilities corresponding to the model's confidence that each respective category is present in the video.

We have used the crowd-sourced annotations from Chimp&See to generate ground truth labels for each video in the dataset. Some videos have no animals in them, in which case the blank category of the video's labels will be 1 and all other columns will be 0. Otherwise, if a species is present its entry will be a 1. Multiple species may be present!

Wisdom of the masses: a note on crowdsourcing the truth. We have taken many steps to go from raw annotations to a well-labeled dataset. This includes enforcing certain thresholds on how many user annotations are required to accept a label as well as thresholds related to percentages of user agreement. That said, this technique for leveraging crowdsourced data is uncharted territory and there is bound to be some noise!

The features in this dataset


Videos

The only features in this challenge are the videos themselves, named as subject_id.mp4. Each video is 15 seconds long, but it's unlikely that you'll need all 15 seconds of frames to make a good prediction. Whether or not you downsample the videos is up to you!

To help facilitate faster model prototyping, we've created two downsampled versions of the dataset, referred to as "Micro" and "Nano." See the table below for details about each version.

Dataset Version Size Resolution (px) Audio Channel
Raw 1 TB 960 × 540 (typically, but not all uniform) yes
Micro 3.46 GB 64 x 64 no
Nano 1.4 GB 16 x 16 no

Labels

There are 24 categories which may be present or absent in each video. If a blank label is present, all other categories will be absent. For non-blank categories, multiple may be present.

Video label example


For example, a single label in the dataset may have these values, indicating the presence or absence of categories in video abc0000123.mp4:

filename abc0000123.mp4
duiker 1
bird 1
blank 0
cattle 0
chimpanzee 0
elephant 0
forest buffalo 0
hog 0
gorilla 0
hippopotamus 0
human 0
hyena 0
large ungulate 0
leopard 0
lion 0
other (non-primate) 0
other (primate) 0
pangolin 0
porcupine 0
reptile 0
rodent 0
small antelope 0
small cat 0
wild dog 0

Performance metric


Performance is evaluated according to a mean aggregated binary log loss. For each possible category in a video the binary log loss will be computed then the results will be summed (this accounts for potential presence of multiple species in a single video). The sum of the binary losses represents the total loss for the video. The competitor that minimizes the mean value of this loss over all test cases will top the leaderboard.

Submission format


The format for the submission file is filename, followed by all categories and a floating point representation of the probability that category X is present in the video. In the extreme case, every non blank category could be, say, 0.99, which would indicate strong confidence that 24 categories of animals are present in the video.

For example, if you predicted...
subject_id bird blank cattle chimpanzee elephant forest buffalo gorilla ... hog
abc0000001.mp4 0.777778 0.0 0.0 0.0 0.0 0.111111 0.0 ... 0.0
abc0000002.mp4 0.0 0.0 0.0 0.0 0.0 0.0 1.0 ... 0.0
abc0000003.mp4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0
abc0000004.mp4 0.0 0.0 0.0 0.0 1.0 0.0 0.0 ... 0.0
abc0000005.mp4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0

Your .csv file that you submit would look like:

subject_id,bird,blank,cattle,chimpanzee,elephant,forest buffalo,gorilla,hippopotamus,human,hyena,large ungulate,leopard,lion,other (non-primate),other (primate),pangolin,porcupine,reptile,rodent,small antelope,small cat,wild dog,duiker,hog
abc0000001.mp4,0.7777777777777778,0.0,0.0,0.0,0.0,0.1111111111111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1111111111111111,0.0,0.0,0.0
abc0000002.mp4,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
abc0000003.mp4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.875,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
abc0000004.mp4,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
abc0000005.mp4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0

Good luck!


If you're wondering how to get started, check out our benchmark blog post!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!