Introduction

Observational studies of wildlife occupancy and abundance are more important than ever as human disturbance has decreased wildlife population sizes by up to 60% globally in the last four decades (WWF 2018). These staggering declines have the of ecological monitoring through a variety of means including camera traps, mark-recapture methods, point counts, and line transects. Camera traps have become an especially useful for the rapid assessment of wildlife because they require fewer field hours than other common field methods, may be reviewed by other researchers, and minimize disturbance to the environment (Silveira et al. 2003, Steenweg et al. 2017, McCallum 2013). While camera traps are a useful tool for some ecological studies, processing massive quantities of images created by camera trap networks is a major limiting factor for researchers. Until methods are developed to efficiently process images, these limitations will persist in future studies and as camera trap networks become more complex.
Previous camera trap studies have noted factors which result in large accumulations of images. Wind, loose shrubbery, camera settings, and animal behavior specific to each camera site add noise to the dataset (Newey et al. 2015). The time involved in manually processing these false triggers, which often represent a majority of captured images, can delay analysis to the point where conclusions are no longer relevant. because a large expenditure of resources is often required to process images manually (Willi et al. 2019).
Increase in the use of camera traps for ecological studies has led to a push for standardized methods to improve the workflow of image analysis (Glover2019). One promising avenue for processing camera trap images is the utilization of artificial intelligence (AI) technology.
AI trained with convolutional neural networks (CNNs) has been employed and tested on several large datasets previously processed by citizen scientists. Swanson et al. (2015) trained and created a CNN for the Snapshot Serengeti dataset which consists of 3.2 million images collected over 99,241 camera trap days. The output of the neural network reached an accuracy of greater than 93.8% when compared to the records of citizen scientists. While several large-scale studies (e.g. Norouzzadeh et al. 2018) have achieved similar accurac on such large datasets, the training of these neural networks requires large numbers of images and substantial computer time to train the model. Such investments are often not feasible for smaller camera trap studies the current assumption that many thousands of images are needed to successfully train a model.
Only the largest camera trap studies have attempted to create their own neural networks, as it has been suggested that small clusters of images (~1,000-5,000 images per species class) are not sufficient for deep learning (e.g. Norouzzadeh et al. 2018). Each model built by these large-scale studies must be tailored to particular set of species in order to properly function because neural networks are a complex series of algorithms that are used to detect specific features in supervised data. The neural network learns the features belonging to each species class, allowing it to differentiate between objects and the background of images while also classifying objects. Therefore, the model may not be similar enough to another study’s range of objects and backgrounds to be useful, even in the same geographical location.
We suggest that the use of transfer training on neural networks has been overlooked for small scale camera trap studies. Adapting a neural network to a dataset by adjusting the final layers of the network through transfer learning and then reinforcement learning on a desired image set can be extremely useful, especially when data is scarce. e predict a premade neural network could achieve similar identification accuracy as neural networks trained with thousands of images while not requiring such a large memory footprint. Using a transfer-trained neural network allows camera trap surveys to be affordable, data efficient, and accessible to a broad range of projects.
Neural networks are used for all types of image processing and many are freely available through open-source software (e.g. Google, PyTorch, Keras). A premade neural network can be selected from an archive based on the types of images the network was built on; for instance, a neural network trained on animals/pets would be ideal for a camera trap project interested in identifying medium to large sized mammals. To mimic a small-scale camera trap study, we trained a premade, freely available neural network using less than 6,000 images from our larger dataset and achieved similar confidence in object identification as the previously mentioned large scale studies. Here we show that a small amount of diversified image can be as successful at eliminating false positives and identifying species as a model developed using many thousands of images.