Introduction
Observational studies of wildlife occupancy and abundance are more
important than ever as human disturbance has decreased wildlife
population sizes by up to 60% globally in the last four decades (WWF
2018). These staggering declines have the of ecological monitoring
through a variety of means including camera traps, mark-recapture
methods, point counts, and line transects. Camera traps have become an
especially useful for the rapid assessment of wildlife because they
require fewer field hours than other common field methods, may be
reviewed by other researchers, and minimize disturbance to the
environment (Silveira et al. 2003, Steenweg et al. 2017, McCallum 2013).
While camera traps are a useful tool for some ecological studies,
processing massive quantities of images created by camera trap networks
is a major limiting factor for researchers. Until methods are developed
to efficiently process images, these limitations will persist in future
studies and as camera trap networks become more complex.
Previous camera trap studies have noted factors which result in large
accumulations of images. Wind, loose shrubbery, camera settings, and
animal behavior specific to each camera site add noise to the dataset
(Newey et al. 2015). The time involved in manually processing these
false triggers, which often represent a majority of captured images, can
delay analysis to the point where conclusions are no longer relevant.
because a large expenditure of resources is often required to process
images manually (Willi et al. 2019).
Increase in the use of camera traps for ecological studies has led to a
push for standardized methods to improve the workflow of image analysis
(Glover2019). One promising avenue for processing camera trap images is
the utilization of artificial intelligence (AI) technology.
AI trained with convolutional neural networks (CNNs) has been employed
and tested on several large datasets previously processed by citizen
scientists. Swanson et al. (2015) trained and created a CNN for the
Snapshot Serengeti dataset which consists of 3.2 million images
collected over 99,241 camera trap days. The output of the neural network
reached an accuracy of greater than 93.8% when compared to the records
of citizen scientists. While several large-scale studies (e.g.
Norouzzadeh et al. 2018) have achieved similar accurac on such large
datasets, the training of these neural networks requires large numbers
of images and substantial computer time to train the model. Such
investments are often not feasible for smaller camera trap studies the
current assumption that many thousands of images are needed to
successfully train a model.
Only the largest camera trap studies have attempted to create their own
neural networks, as it has been suggested that small clusters of images
(~1,000-5,000 images per species class) are not
sufficient for deep learning (e.g. Norouzzadeh et al. 2018). Each model
built by these large-scale studies must be tailored to particular set of
species in order to properly function because neural networks are a
complex series of algorithms that are used to detect specific features
in supervised data. The neural network learns the features belonging to
each species class, allowing it to differentiate between objects and the
background of images while also classifying objects. Therefore, the
model may not be similar enough to another study’s range of objects and
backgrounds to be useful, even in the same geographical location.
We suggest that the use of transfer training on neural networks has been
overlooked for small scale camera trap studies. Adapting a neural
network to a dataset by adjusting the final layers of the network
through transfer learning and then reinforcement learning on a desired
image set can be extremely useful, especially when data is scarce. e
predict a premade neural network could achieve similar identification
accuracy as neural networks trained with thousands of images while not
requiring such a large memory footprint. Using a transfer-trained neural
network allows camera trap surveys to be affordable, data efficient, and
accessible to a broad range of projects.
Neural networks are used for all types of image processing and many are
freely available through open-source software (e.g. Google, PyTorch,
Keras). A premade neural network can be selected from an archive based
on the types of images the network was built on; for instance, a neural
network trained on animals/pets would be ideal for a camera trap project
interested in identifying medium to large sized mammals. To mimic a
small-scale camera trap study, we trained a premade, freely available
neural network using less than 6,000 images from our larger dataset and
achieved similar confidence in object identification as the previously
mentioned large scale studies. Here we show that a small amount of
diversified image can be as successful at eliminating false positives
and identifying species as a model developed using many thousands of
images.