competition
complete
$5,000

Woohoo! This competition has come to a close!

Many thanks to the participants for all of their hard work and commitment to using data for good!

Problem description

This is where you'll find all of the documentation about this dataset and the problem we are trying to solve. For this competition, there are three subsections to the problem description:

The features in this dataset


Your goal is to predict for each image whether it is a bumble bee (Bombus) or a honey bee (Apis). All of the images have been scaled and cropped so they are 200px x 200px. These images are submitted by citizen scientists and vary in terms of image quality, distance from subject, background and position of the bee.

honey bee bumble bee
Apis (honey bee) Bombus (bumble bee)


It's up to you to build features by hand using image processing techniques, transforms, and filters. Or, you can try techniques that will learn the features for you. How you create features from the raw images is going to be the key to this challenge.

File Names

There are two folders of images for download, one labeled test and one labeled train. Inside those folders are the images of bees that correspond with the test set and the training set. The files are named with the conventions {id}.jpg. The {id} in the filename matches the id column in the training labels for the training data and in the submission format for the test data.

The labels in this dataset


dist image

Distribution of Labels

The data set is about one quarter Apis (honey bee) and three quarters Bombus (bumble bee). This is partly due to the fact that there are is a greater number of species in this dataset under the Bombus genus than the Apis genus. Depending on the success of competitors at identifying the genus of bees, we may run a future competition to identify these bees at the species level.

Submission format


The format for the submission file is a float between 0 and 1 representing the probability that the bee pictured is a bumble bee (Bombus). A prediction closer to 0 means it is more likely to be of the genus Apis. A prediction closer to 1 means it is more likely to be of the genus Bombus. For an example, see SubmissionFormat.csv on the data download page.

Apis Bombus
0 1

For example, if you didn't know which genus a bee was, you would predict 0.5 for all of the predictions.
genus
id
2783 0.5
2175 0.5
4517 0.5
2831 0.5
3556 0.5

Your .csv file that you submit would look like:

id,genus
2783,0.5
2175,0.5
4517,0.5
2831,0.5
3556,0.5
3111,0.5
3113,0.5
3962,0.5
1664,0.5
...

Good luck!


Good luck and enjoy this problem! If you have any questions you can always visit the user forum!