4. DISCUSSION
The machine learning pipeline we used in this study appears to be an
effective tool for collecting occurrence data across a range of habitat
types at our target site. The lack of statistically significant
differences in relative detection frequencies between the audio and
camera trap data conflicted slightly with our expectation that acoustic
sampling would yield more accurate occurrence metrics than camera trap
sampling. However, the sample size of our audio dataset was much smaller
than that collected by the camera trap network over a roughly similar
period of time (n = 122 before filtering for season), which suggests
that acoustic monitoring is capable of yielding much higher data
densities per unit surveying time, at least for vocal species.
Similarly, increasing the sample size of the camera trap dataset and
collecting audio samples from the wet season may yet allow us to
identify true underlying differences in detection probabilities for
tinamous when surveyed acoustically versus visually.
The significant differences in detection frequency we observed between
our data and the eBird data is likely a result of non-random spatial
sampling. An example of this spatial non-randomness with a clear
causative explanation is the relatively higher eBird detection frequency
for C. undulatus , a species that is present widely in floodplain
and transitional forest but is also extremely common in edge habitat
near the station dwellings where ecotourists and birders visiting the
station spend time when not hiking on trails (eBird, 2017; personal
obs). We chose not to include C. strigulosus in frequency
analyses as it is represented in our audio dataset mainly by detections
at sites east of the Río Los Amigos that birders and ecotourists
visiting the station are rarely if ever able to access, therefore
heavily limiting its sampling density in the eBird dataset (personal
obs). However, even in the absence of quantitative assessment, we
nonetheless believe this is another clear case of spatially non-random
eBird sampling patterns relative to the more structured audio and camera
trap data. We therefore advise caution when using eBird data to generate
site-level relative occurrence frequencies for tropical forest birds, as
doing so properly requires a substantially better-informed set of sample
bias corrections than we chose to use for this illustratively naïve
approach. eBird’s own Status and Trends methods are a classic example of
how this can be done analytically, though the relatively low eBird data
density across the Neotropics has meant that analyses using these
methods have mainly been focused on the temperate zone (Sullivan et al.,
2009; Sullivan et al., 2014; Fink et al., 2018). Employing study designs
that use eBird data as an adjunct to more structured surveying
techniques is another possible strategy (Reich et al., 2018), as this
strategy reduces the proportion of overall bias due to eBird on
ecological modeling efforts in this region while retaining the benefits
of using multiple independent datasets to address the same question.
A common question posed by research scientists in the pursuit of an
efficient but effective machine learning platform is “how much training
data is enough data.” Our two-pass classification strategy demonstrated
clear classification accuracy improvements over a single pass, though
the degree to which our ensemble modeling strategy improved
classification performance varied substantially between classes. We
suspect that most of the performance improvements that could be gained
beyond what we saw in our analysis would come from gathering additional
survey data, iterating the data collection and training processes to
increase sample sizes, and further improving the model architecture and
hyperparameters. It is important to note that the main limiting factor
for our use of machine learning classification has been the amount of
computational power available to us, which required us to decrease the
complexity of our neural networks and the resolution of our spectrograms
relative to those mentioned in the literature (Knight et al., 2017; Kahl
et al., 2019). While doing so allowed us to produce classification
results within acceptable time constraints, this speed benefit
potentially came at the cost of reduced classification accuracy. An
important future goal for our analyses is to securing sufficient
computational power to run the classification at full resolution to
quantify improvements in accuracy, as we strongly believe that
understanding the minimum acceptable resolution necessary to achieve a
given level of accuracy is a crucial logistical consideration for
researchers seeking to build hardware systems to support similar data
processing pipelines.
Acoustic monitoring represents a promising method for studying bird
biology and life history. We are particularly excited by the prospect of
being able to use this SWIFT survey data in future analyses to identify
the life history and microhabitat characteristics that result in niche
partitioning in the tinamou community of lowland Madre de Dios. We
anticipate that additional data collection, particularly during the wet
season, and further refinement of this machine learning pipeline will
allow us to build occupancy models for these species using elevation
maps and vegetation structure datasets that were collected for use with
the associate camera trap grid as environmental covariates (Royle &
Nichols, 2003).