Deep Chimpact: Depth Estimation for Wildlife Conservation Hosted By MathWorks


An image of a chimpanzee, and a model-generated depth mask for the image

The depth estimation models and level of precision achieved by the winning teams offer massive time saving in the range of 40-60% for the processing of camera trap footage. These will help us to develop effective monitoring approaches for surveying hundreds of wildlife species.

— Hjalmar Kühl, Senior Scientist at the iDiv (German Centre for Integrative Biodiversity Research Halle-Jena-Leipzig)


To protect the Earth's natural resources amid environmental and human pressures, conservationists need to be able to monitor species population sizes and population change. Camera traps are widely used in conservation research to capture images and videos of wildlife without human interference.

Using statistical models for distance sampling, the frequency of animal sightings can be combined with the distance of each animal from the camera to estimate a species' full population size. However, getting distances from camera trap footage currently entails an extremely manual, time-intensive process. This creates a bottleneck for conservationists working to understand and protect wild animals and their ecosystems.

The Solution

The goal of this challenge was to use machine learning and advances in monocular (single-lens) depth estimation techniques to automatically estimate the distance between a camera trap and an animal contained in its video footage. The challenge drew on a unique labeled dataset from research teams from the Max Planck Institute for Evolutionary Anthropology (MPI-EVA) and the Wild Chimpanzee Foundation (WCF). Participants were evaluated based on how accurately their automated algorithms could predict the distance between the camera and wildlife at each point in time.

The Results

Over the course of the competition, participants tested over 900 solutions and were able to significantly advance existing methods of depth estimation in wildlife contexts. One of the most recent studies applying machine learning to depth estimation, Overcoming the Distance Estimation Bottleneck in Estimating Animal Abundance with Camera Traps (2021), used monocular depth estimation to achieve a mean absolute error of 1.85 m. The method relied on a series of reference videos that required field researchers to travel to and visually document distances at each camera trap location.

Competitors were not provided with reference videos to more fully automate the time-intensive process. But our winners were not deterred! The top-scoring solutions improved on the state-of-the-art method, with the top model achieving a mean absolute error (MAE) of 1.62. The winning approaches were also most accurate at close distances, a useful result for conservation applications using distance sampling. These innovative models could help make depth estimation more accessible to conservationists around the world, even those without the time or resources to create reference videos.

All of the prize-winning solutions from this competition are linked below and made available on for anyone to use and learn from. The project team is now working to apply these solutions in connection with the open-source Project Zamba, a computer vision repository for wildlife conservation.