Figure 5. Boxplot of the feature ODBA.
UMAP is a very powerful nonlinear dimensionality-reduction technique, which is also highly suitable for high-dimensional data visualization (McInnes, Healy, Saul, & Grossberger, 2018) and we will here use it to transform and visualise collections of features in a two-dimensional plot. UMAP has already found its niches in bioinformatics, material sciences and machine learning (McInnes et al., 2018). Within the broad field of biology, it has been used in bioacoustics studies (e.g., Sainburg, Theilman, Thielk, & Gentner, 2019), but it has rarely been used in animal behaviour studies. In the rabs package we use UMAP to plot the different behaviours, represented by differently coloured symbols in the two-dimensional space. The optimal scenario to which one strives is to obtain a representation where each behaviour forms an isolated cluster of symbols within this two-dimensional space. In this way, UMAP provides an indication of how the final classification model will perform; isolated behaviour clusters indicating high classification accuracy. There where overlaps in clusters exists, researchers may wish to consider grouping certain behaviours because they may not be adequately separated using ACC data. Conversely, there where behaviours are spread out over a plot, having those behaviours reclassified in multiple behaviour types, may be a possibility.
We made the UMAP visualization into a Shiny App to facilitate user interaction. There are three tabs in the Shiny App, representing three functions. Tab 1: ”UMAP calculation and tuning” – assists with evaluating whether ACC features adequately represent behaviours. Tab 2: ”Feature visualization through UMAP” – can show how feature values vary across the two-dimensional UMAP plot. Tab 3: ”Selected features” – assists with evaluating the performance of selected features in differentiating between the different behaviours. In Fig. 6 we show screenshots of the three UMAP tabs, loaded with the time and frequency domain features from the white stork dataset. It shows that the different behaviours separate generally well (Fig. 6a), suggesting that there is good potential to develop a satisfactory performance behaviour classification model. In the next tab (Fig.6b), we selected the ODBA feature, the plot showing how its value varies across the different behaviour types with active flight having distinguishably high ODBA values followed by walking, then passive flight, standing and sitting. Finally, in the third tab (Fig.6c), we only selected the six features identified by function select_features to form a new UMAP plot. We can see that these features can preserve the manifold structure of the different behaviours. The demo of this Shiny App can be access through < https://huiyu-deakin.shinyapps.io/rabc_UMAP/>.