2.4 Performance of the image-matching software
To test which image-matching software most accurately matched crops of
the same individual, we created two separate datasets for the Kenyan and
Zimbabwean populations. To select suitable crops, we used the four-step
image pre-processing method described above. We also visually inspected
discarded crops to avoid missing suitable crops. We then manually
identified individuals from the dataset of right flank crops, to provide
a standard against which automated identifications could be compared,
and randomly selected two crops per individual. To prevent similar
lighting conditions and posture from creating a bias towards matching
images of the same individual, we ensured selected images were taken on
different days. The two generated datasets consisted of 104 individuals
from the Kenyan population, and 48 individuals from the Zimbabwean
population. To increase the dataset for the Zimbabwean population, we
also included left-flank crops for 41 individuals and horizontally
mirrored the crops to enable comparison with the right-flank crops. This
increased the total number of unique flanks from the Zimbabwean
population to 89. The coat pattern of wild dogs differs between right
and left flanks, and we have no reason to expect that including
left-flank crops would bias our results.
We analysed the Kenyan dataset with each of the three image-matching
software packages: Hotspotter, WildID, and
I3S-Pattern. We then analysed the Zimbabwean dataset
with Hotspotter and WildID. I3S-Pattern was not tested
with the Zimbabwean dataset because tests with the Kenyan dataset
identified it to be considerably less accurate than the other software
packages and considerably more time-consuming to input images and assign
reference points.
We also examined whether image background removal increased the accuracy
of WildID and Hotspotter. I3S-Pattern requires users
to manually select the outline of the animal in the program and
therefore was not included in this analysis, because it does not take
the background into account in its default use. We compared the
image-matching results obtained using images from which we manually
cropped just the individuals’ flanks, with those based on crops of
complete individuals from which the background was automatically removed
(see Figure S2). For three of the 178 images from the Zimbabwe site, the
algorithm did not crop out the wild dog, instead cropping out vegetation
in the foreground. For these images, a manually cropped flank of the
wild dog was used.
To compare the image-matching performance of each software package, we
examined the 10 crops identified as most similar to the sample
individual. We used the first 10 ranked images, as the best performing
software’s accuracy started levelling off around this rank, indicating
that inspecting the first 10 image matches could maximise recognition
rates, while minimising the time spent visually inspecting and
confirming potential matches. We used a mixed effects logistic
regression to test for differences in the efficacy of the software
packages. Here, the response variable was a binary variable describing
whether or not an individual was successfully matched in the first 10
ranked images, and software package was the explanatory variable.
Individual identity was included as a random effect to avoid
pseudoreplication. Post-hoc pairwise comparisons were carried out using
Tukey contrasts. This analysis was performed separately for the
Zimbabwean and Kenyan datasets. Models were run using the “lme4” (v.
1.1-27.1, Bates et al., 2015) package in Program R (R Core Team, 2020,
version 4.0.4).
Previous studies have shown that the image-matching performance of
different software packages is affected by database size (Matthé et al.,
2017). Therefore, to compare software performance on wild dogs from
Kenyan and Zimbabwean populations, we randomly selected a subset of the
Kenyan individuals to equal the number of identified individuals in the
Zimbabwean dataset (n = 89). We then used the best performing software
package identified in the previous step of the analysis to rerun the
image-matching analysis for both datasets. Differences in software
performance between the two populations were then assessed using a mixed
effects logistic regression with a binomial link function. The response
variable in the model was whether or not a match was detected in the
first 10 ranked images, and study site (Kenya or Zimbabwe) was the
explanatory variable. To correct for possible differences in image
quality, two proxies for image quality were included in the model.
Firstly, we included image size (total number of pixels of the crop) as
a continuous predictor. Secondly, all images were visually scored on a
scale of 1 to 3, based on how well their distinct marks could be
recognised. This approach followed Nipko, Holcombe & Kelly (2020),
where score 1 was given to images that were out of focus, of a moving
animal, or badly lit, score 2 was given to images of intermediate
quality, and score 3 was given to images where all features were clearly
visible (for examples, see Figure S3). Score was included as a fixed
effect, and individual identity was included as a random effect.
Furthermore, a Wilcoxon Rank Sum test was performed to test for
differences between the quality score of crops from Kenya and Zimbabwe.
The model was fit using the “lme4” package (v 1.1-27.1, Bates et al.,
2015) in R (version 4.0.4, R Core Team, 2020).
3. Results