2.2 Pre-processing steps
To automate the selection of suitable images for image-matching we developed a five-step image pre-processing method (Figure 2).
2.2.1 Detecting and cropping individuals from images
The aim of the first step in the image pre-processing method was to automatically detect and crop wild dog individuals from the images. To do this, we used the Microsoft AI for Earth MegaDetector (hereafter ‘MegaDetector’, Beery, Morris & Yang, 2019) that automatically detects and crops animals in images. We assessed the efficacy of this method by visually recording the presence of wild dogs in a subset of 1060 images from the Kenyan dataset and 246 images from the Zimbabwean dataset, and comparing the results to the cropped images (hereafter ‘crops’) produced by the MegaDetector for the same subset of images. In this way, we obtained the MegaDetector’s number of true positives (wild dogs that were successfully detected), false positives (detections which did not contain a wild dog), and false negatives (wild dogs which were found by visual inspection, but not by the MegaDetector). All images contained wild dogs, so there were no true negatives in the dataset.
2.2.2 Aspect-ratio filtering
The aim of the second step in the image pre-processing method was to filter out images that were unsuitable for identification due to the individual’s body rotation in the image. We considered crops suitable for image-matching if approximately ≥80% of the individual’s flank was visible, and the angle between the image axis and animal’s flank was less than approximately 30°, i.e., the flank was facing the camera. Crops where the angle between the image axis and the animal’s flank was more than 30° were expected to be narrower than crops suitable for image-matching and therefore demonstrate a relatively low aspect-ratio. By contrast, crops where the flank was concealed because the individual was lying down, or obscured by vegetation, were expected to be considerably wider and demonstrate a relatively high aspect-ratio. These criteria were visually assessed for the crops that the MegaDetector produced in the previous step. We then calculated the range of aspect-ratios for suitable crops, i.e., where an unobscured flank was facing the camera, using the “jpeg” package (Urbanek, 2021) in Program R (version 4.0.4, R Core Team, 2020). Images with an aspect-ratio outside of this range were removed from the dataset.
2.2.3 Selecting standing individuals
Not all sitting or lying individuals could be filtered out solely using image aspect-ratios. Therefore, the aim of the third step in the image pre-processing method was to filter out the remaining crops that were unsuitable for identification because the individual’s body position, i.e. sitting or lying, obscured the full coat pattern. To do this, we trained a Convolutional Neural Net (CNN) to classify crops as either a standing wild dog or a sitting wild dog. To obtain data to train this image classifier, we used the full image catalogues from both sites (n = 11205). The crops produced by steps 1 and 2 of the pre-processing (n = 21745) were then manually classified as either containing a standing wild dog (n = 13500) or sitting wild dog (n = 6512). We removed all crops depicting anything other than wild dogs (e.g., birds, rocks or logs), or wild dogs where it could not be confirmed whether they were standing or sitting, because only a small part of the animal was visible (n = 1733). We trained a CNN using the remaining 20012 pre-processed crops, to classify these as containing either a standing wild dog or not. The CNN was made using Tensorflow (Abadi et al., 2016) in Python (Version 3.6.10). The model was trained with 16012 crops, validated with 2000 crops, and tested with 2000 crops.
CNN’s consist of convolutional layers (Albawi, Mohammed & Al-Zawi, 2017): filter layers which digitally ‘slide’ over the image and aim to recognise specific features. The convolutional layers pass a map of specific features to the next layer, a Max Pooling layer. The Max Pooling layer reduces the resolution of this feature map, thus reducing the importance of the position of features within this map. This step can help prevent the model from becoming too fine-tuned to the training data, which causes over-fitting and reduces the generalisability of the classifier. After this, a dropout layer is applied, which randomly removes 50% of connections made between layers. This benefits the model by teaching it to recognise robust features, again preventing over-fitting. The data are then passed on to a flattening layer, which turns the data into a 1-dimensional string, which is passed onto the final two layers. Firstly, the string goes through a layer which connects all the data from the previous layer and produces prediction scores from the inputs. Secondly, another layer turns these scores into a single prediction: standing, or not standing (for a more detailed description of CNNs, see O’Shea & Nash, 2015 and Albawi, Mohammed & Al-Zawi, 2017).
The number of convolutional layers and the size of the filters that they comprise was optimised using KerasTuner (O’Malley et al., 2019). KerasTuner runs CNNs with a range of values, and automatically selects the model with the highest validation accuracy, i.e., the proportion of correct classifications on the validation database. KerasTuner ran CNNs with between one and three convolutional layers, with 16, 32 and 64 filters per layer, and with a kernel size (the number of pixels in the filters) of 3x3 pixels. This was done for 20 different random combinations for the number of convolutional layers and number of filters per layer. Test runs showed that the maximum accuracy was reached before the 70th epoch, and therefore each combination was run for 70 epochs, meaning that the training data were passed through the CNN 70 times. The learning rate of the model, that is, the speed at which the model improved itself, was also optimised with KerasTuner, testing a rate of 10-3, 10-4, and 10-5, with the optimal number of convolutional layers. The model with the highest test-accuracy was selected as the final model.
2.2.4 Separating left and right flanks
The aim of the fourth step in the image pre-processing method was to separate crops depicting left- and right-flanks of a wild dog, because image-matching software packages can only match images for one side of the animal. To do this, we made another CNN to automate the separation of left- and right-flanks. To obtain training data for this CNN, we visually classified all crops of standing dogs used for the CNN in step three (n = 12357) whose side was facing the camera, as showing the right (n = 6140) or left flank (n = 6217). We optimised this CNN’s parameters as described in step three of the image pre-processing method, using KerasTuner to find the optimal number of convolutional layers and learning rate. Each CNN ran for 100 epochs, because test runs showed that this model took longer than the previous model to reach its maximum accuracy. The first layer of this CNN was an average pooling layer, a layer which reduced the resolution of the input images by a factor of four, which prevents overfitting. This layer was added to this CNN, because preliminary runs showed this CNN was more prone to overfitting than the CNN developed in step three of the image pre-processing method. We used 9857 crops as training data, 1246 as validation data, and 1246 as testing data. All other layers were equal to the previous CNN. For the full model conditions, see Table S1 in Supporting Information.
2.2.5 Image background removal
Lastly, we removed the image backgrounds of suitable images using the “rembg” package in Python (Gatis, 2020). We removed image backgrounds to remove the risk of the background confounding image-matching results, while eliminating the need to manually select an individual’s flank.