Deep Chimpact: Depth Estimation for Wildlife Conservation Hosted By MathWorks


Project background

We rely on Earth’s natural ecosystems every day. Healthy forests store carbon that would otherwise be emitted into the atmosphere. Animals and insects pollinate crops that yield food. Natural tourist destinations provide income for many low-resource communities. Protecting wild habitats helps prevent the spread of infectious diseases from animals to humans.

The health of these ecosystems depends on a varied and complex web of flora and fauna, called biodiversity. Today, we are facing a biodiversity crisis as more and more species become endangered or extinct.

Conservation efforts to monitor and protect biodiversity often involve tracking the size of a species’ population in the wild. One of the best tools we have to study wildlife populations is camera traps. Camera traps can be triggered by movement or heat, providing enormous amounts of data observing the natural world without human interference.

An image of a chimpanzees captured by a camera trap

An image of two adult chimpanzees and a baby chimpanzee captured by a camera trap in Moyen-Bafing National Park, Republic of Guinea

In 2017, researchers at The Max Planck Institute for Evolutionary Anthropology partnered with DrivenData to host a machine learning competition aimed at one of the foundational challenges in processing camera trap footage: species identification. The winning algorithm was developed into a freely available Python package called Zamba, which means forest in Lingala.

To fully automate species abundance estimation from camera trap footage, however, detection and classification are not enough. Researchers also need to know the distance between the animal and the camera trap. The probability of observing an animal depends on its depth, so depth is a key input into the best statistical models that estimate population size - a method called distance sampling (Howe et al, 2017).

Currently depth estimation requires either manual human notation or multiple camera traps in the same location, both of which are very time intensive. It takes a researcher more than 10 minutes on average to label distance for every 1 minute of video - that’s a lot of time when you have a million videos! The goal of this challenge is to build computer vision models that can automatically predict an animal’s depth, rapidly accelerating this process.

An image of a chimpanzee, and a model-generated depth mask for the image

Left: An image of a chimpanzee from a camera trap. Right: The depth mask generated for the image by the monodepth2 model available on GitHub.

The ability to automate depth estimation from camera traps would enable conservationists to get rapid information on population changes for a wide range of species, helping to identify populations under threat, evaluate the effectiveness of different conservation strategies, and more quickly respond to the escalating threats to Earth’s ecosystems.

About the data

The competition data was collected and labelled by two research teams from the Max Planck Institute for Evolutionary Anthropology (MPI-EVA) and the Wild Chimpanzee Foundation (WCF).

Each dataset was manually labelled using reference videos as guides to determine distance. To create the reference videos, field researchers recorded themselves walking away from each camera trap, holding up a sign every meter to indicate how far they were from the camera. (Imagine how much easier it would if be a machine could that instead!) As a result, the data for this challenge presents an exceptional resource for developing depth estimation models for wildlife conservation.

A man holding up a sign with the number five and a jungle in the background

Researcher Serge Armand Bahi in Taï National Park, Côte d'Ivoire, holding up a sign at 5 meters away as part of a depth reference video. Image courtesy of Wild Chimpanzee Foundation.

Helpful resources

Distance sampling and camera traps

Depth estimation