Discussion

CNN Accessibility

This study demonstrates that AI-based identification and classification models are more accessible than previously thought. Until now, processing of camera trap images has been limited by human observers, expense, processing time, and ignorance of computer science techniques for in ecological studies. Employing labeling services (e.g. Google Cloud) can be unreliable for processing large datasets, and to have images labeled and processed currently costs approximately $0.05 per image (Google Cloud); which may not be practical when tens of thousands of images are involved.
An increasingly accurate and efficient method of image processing is transfer training (e.g. Deepak et al. 2019, Swati et al. 2019, Shi et al. 2019), which is an especially desirable technique for studies with limited data (Shin et al. 2016). Despite improvements in this training architecture, the use of these methods in ecology has been limited. Transfer training saves time and reduces data requirements, allowing for smaller studies to spend less time processing while still calibrating the architecture with specific images and training the model on a percentage of their complete dataset. Additionally, transfer training prevents overfitting of the model, which can be an issue when using a smaller number of images (Deepak and Ameer 2019, Han et al. 2018).
A smaller image set allows the model to be more flexible, making it more applicable for ecologists than other advanced machine learning techniques (Xie et al. 2016). Feature extraction with transfer training provides camera trap projects an alternative option to starting a CNN architecture from scratch, instead opting to use a pre-trained CNN product (e.g. Microsoft MegaDetector) or unsupervised learning techniques (e.g. cluster analysis).
By using open-source programs and premade neural nets, models can be built to simply remove images without animals or to fully automate the classification of species. This study, along with similar studies (e.g. Tabek et al. 2019), provides evidence that a reliable identification and classification model can be created with open-source tools (e.g. Tensorflow) by using transfer learning and premade neural networks. Further, we completed this process using a very limited set of images and achieved encouraging results. This technology could be especially desirable for researchers wishing to eliminate false positives as well as to quickly sort and label species classes.

Calibration Analysis

Currently, accuracy is the standard metric to evaluate classification models for camera trap studies (Gomez et al. 2016, Norouzzadeh et al. 2018, Swanson et al. 2015). We suggest the optimization of customized models also be based on F-1-score rather than relying on accuracy alone, because accuracy can be heavily biased by TNs (Wolf et al. 2006). This the greater than 20% difference between our test accuracy (TNs excluded) and validation accuracy (TNs included).
The metrics used to optimize a model will depend on the purpose of the project and the resources available to the researcher. The F-1-score can be broken down into precision and recall, both of which can be optimized for different purposes. In a study focusing on rare species (e.g. Alexander et al. 2016, Karanth et al. 1995), precision should be optimized to ensure the detection of all possible occurrences of animals. Alternatively, recall should be optimized if processing time is limited and every image of an animal is not essential for the global analysis. Optimizing recall is ideal for a general survey of common, easily identified animals (e.g. Chitwood et al. 2017).

Optimizing Model Performance

Analyzing model performance during training is especially useful to determine which classes the model is not identifying and is easily visualized using IOU graphs. Precision during training did not seem to depend on the number of images used to train each class; rather, the type of object the class refers to was most important in determining the model. Objects with unique shapes, color patterns, and textures (e.g. turkey and armadillo) were detected by the model more easily (Fig. ). The model was not as successful with objects that were small and difficult to distinguish from the background (e.g. grey squirrel), similar to another class (e.g. coyote and dog), or when train examples were highly variable in the subjects within the same class (e.g. humans and vehicles).
Depending on the aim of the study, the choice of metric allows the researcher to facilitate either an ID or CL model. Certain camera trap studies benefit greatly from automating the removal of TNs, especially when focusing on topics such as camera trap effectiveness (e.g. Ferreira-Rodríguez et al. 2019, Edwards et al. 2016) or instances where human-supervised processing will be required to extract details such as behavior. To focus a model on detection of objects rather than classification, researchers should focus on metrics associated with ID. The use of this type of identification model would allow researchers to decrease processing time and ensure detection of objects while not being overly concerned with the accuracy of species classification by the model. Alternatively, studies focusing on general ecosystem monitoring (e.g. Steenweg et al. 2017, Jiménez et al. 2010) or density of common species (e.g. Parsons et al. 2017) would benefit from a CL model, and should use CL metrics to build a model fully capable of both identifying and classifying species.
Several methods may be employed to adjust the model’s parameters. CTs are a simple way to a model to reach the desired metric’s optimal value. If optimization cannot be reached by of CTs the model can be further improved by adding images to classes which the model consistently predicts incorrectly. This will help the model learn from the dataset and objects
As biodiversity declines worldwide (Kolbert 2014), employing commonly used computer science techniques in future camera trap studies will greatly enhance our ability to monitor wild populations.

Conclusions

  1. Transfer training with bounding boxes is successful and requires far fewer training images than traditional model building.
  2. Identification and classification models built using transfer training and small image sets can be very successful with species that are easily distinguished. Species that are more difficult to distinguish can also be identified but require more training images.
  3. The traditional metric of accuracy can give a false sense of confidence in a model because of inflation by true negatives. F-1 should be used for general purposes because it is not biased by true negatives.
  4. Studies focusing on simply removing true negatives do not require high model performance studies attempting to classify species .