Naive Bees Classifier

Can you identify a bee as a honey bee or a bumble bee? Practice image processing and classification techniques and help researchers seeking to protect pollinators from collapse. #science

$5,000 in prizes
dec 2015
431 joined

The features in this dataset

Your goal is to predict for each image whether it is a bumble bee (Bombus) or a honey bee (Apis). All of the images have been scaled and cropped so they are 200px x 200px. These images are submitted by citizen scientists and vary in terms of image quality, distance from subject, background and position of the bee.

honey bee bumble bee
Apis (honey bee) Bombus (bumble bee)


It's up to you to build features by hand using image processing techniques, transforms, and filters. Or, you can try techniques that will learn the features for you. How you create features from the raw images is going to be the key to this challenge.

File Names

There are two folders of images for download, one labeled test and one labeled train. Inside those folders are the images of bees that correspond with the test set and the training set. The files are named with the conventions {id}.jpg. The {id} in the filename matches the id column in the training labels for the training data and in the submission format for the test data.

Distribution of Labels

The data set is about one quarter Apis (honey bee) and three quarters Bombus (bumble bee). This is partly due to the fact that there are is a greater number of species in this dataset under the Bombus genus than the Apis genus. Depending on the success of competitors at identifying the genus of bees, we may run a future competition to identify these bees at the species level.

Submission format

The format for the submission file is a float between 0 and 1 representing the probability that the bee pictured is a bumble bee (Bombus). A prediction closer to 0 means it is more likely to be of the genus Apis. A prediction closer to 1 means it is more likely to be of the genus Bombus. For an example, see SubmissionFormat.csv on the data download page.

Apis Bombus
0 1
For example, if you didn't know which genus a bee was, you would predict 0.5 for all of the predictions.
genus
id
2783 0.5
2175 0.5
4517 0.5
2831 0.5
3556 0.5

Your .csv file that you submit would look like:

id,genus
2783,0.5
2175,0.5
4517,0.5
2831,0.5
3556,0.5
3111,0.5
3113,0.5
3962,0.5
1664,0.5
...

Good luck!

Good luck and enjoy this problem! If you have any questions you can always visit the user forum!