Figure 5. Boxplot of the feature ODBA.
UMAP
is a very powerful nonlinear dimensionality-reduction technique, which
is also highly suitable for high-dimensional data visualization
(McInnes, Healy, Saul, & Grossberger, 2018) and we will here use it to
transform and visualise collections of features in a two-dimensional
plot. UMAP has already found its niches in bioinformatics, material
sciences and machine learning (McInnes et al., 2018). Within the broad
field of biology, it has been used in bioacoustics studies (e.g.,
Sainburg, Theilman, Thielk, & Gentner, 2019), but it has rarely been
used in animal behaviour studies. In the rabs package we use UMAP to
plot the different behaviours, represented by differently coloured
symbols in the two-dimensional space. The optimal scenario to which one
strives is to obtain a representation where each behaviour forms an
isolated cluster of symbols within this two-dimensional space. In this
way, UMAP provides an indication of how the final classification model
will perform; isolated behaviour clusters indicating high classification
accuracy. There where overlaps in clusters exists, researchers may wish
to consider grouping certain behaviours because they may not be
adequately separated using ACC data. Conversely, there where behaviours
are spread out over a plot, having those behaviours reclassified in
multiple behaviour types, may be a possibility.
We made the UMAP visualization into a Shiny App to facilitate user
interaction. There are three tabs in the Shiny App, representing three
functions. Tab 1: ”UMAP calculation and tuning” – assists with
evaluating whether ACC features adequately represent behaviours. Tab 2:
”Feature visualization through UMAP” – can show how feature values vary
across the two-dimensional UMAP plot. Tab 3: ”Selected features” –
assists
with evaluating the performance of selected features in differentiating
between the different behaviours. In Fig. 6 we show screenshots of the
three UMAP tabs, loaded with the time and frequency domain features from
the white stork dataset. It shows that the different behaviours separate
generally well (Fig. 6a), suggesting that there is good potential to
develop a satisfactory performance behaviour classification model. In
the next tab (Fig.6b), we selected the ODBA feature, the plot showing
how its value varies across the different behaviour types with active
flight having distinguishably high ODBA values followed by walking, then
passive flight, standing and sitting. Finally, in the third tab
(Fig.6c), we only selected the six features identified by function
select_features to form a new UMAP plot. We can see that these features
can preserve the manifold structure of the different behaviours. The
demo of this Shiny App can be access through <
https://huiyu-deakin.shinyapps.io/rabc_UMAP/>.