2.4 Feature selection
Feature selection is the process of selecting a subset of relevant features for use in model building (Chakravarty, Cozzi, Ozgul, & Aminian, 2019). In animal behaviour studies using ACC, tens of features are typically used in model building (e.g., Shamoun-Baranes et al., 2012). Although a relatively small number, compared to many other machine-learning models, there may still be redundancy in the feature set. Redundant features are features that show high correlation with other features and are thus likely to contribute similarly to the behaviour classification model. Redundant features may also be “irrelevant” features that hardly contribute to the classification model. Three aims are being served with feature selection in this package. Firstly, less features will make the model easier to interpret. Indeed, there may for instance be biomechanical connections between features and the ultimate classification model (e.g., Chakravarty et al., 2019). Secondly, fewer features reduce the risk of overfitting and may therewith lead to better behaviour classification from ACC data. Thirdly and finally, because of lower computational requirements in assessing behaviour from ACC data, reduced feature sets have greater potential to be calculated on-board the ACC devices themselves, e.g. on-board of light-weight tracking devices (e.g., Korpela et al., 2020; Nuijten, Gerrits, Shamoun-Baranes, & Nolet, 2020) on which they can either be stored or relayed to receiving stations.
The rabc package’s select_features function uses a combination of a filter and a wrapper feature selection method. The filter part removes any redundant features based on the absolute values of the pair-wise correlation coefficients between features. If two features have a high correlation, the function looks at the absolute correlation of each of the two features with all other features and removes the feature with the largest mean absolute correlation value. The threshold correlation coefficient (cutoff) is user-defined with a default ”cutoff = 0.9”. In the default constellation the filter function is turned off (i.e. ”filter = FALSE”).
The purpose of the wrapper is to select most relevant features. The wrapper part applies stepwise forward selection (SFS) (Toloşi & Lengauer, 2011) using the extreme gradient boosting (XGBoost) model, which is not only used for feature selection but also for the final classification model (see below). XGBoost is a scalable tree boosting method that proved to be better than other tree boosting methods and random forest (Chen & Guestrin, 2016). We also experienced ourselves that XGBoost is fast to train and has good performance with limited numbers of trees. The default limit to the number of features (no_features) is 5 but can be user defined. The no_features also determines how many rounds of SFS are being conducted. In the first round, each feature is individually used to train a classification model by XGBoost. The feature with highest overall accuracy will be kept into the selected feature set. Then, in every following round, each remaining feature will be combined with the selected feature set to train a classification model and the one with the highest accuracy will be kept into the selected feature set. The process will stop when the number of rounds equals the no_features setting.
The select_features function will return a list, of which the first member (i.e., .[[1]]) contains a matrix providing the classification accuracy for each of the features (columns) across all steps (rows, top row being the first step) of the SFS process. Once a feature is selected into the selected feature set, the remaining values in this feature’s column are set to zero. The second member of the list (i.e., .[[2]]) contains the names of the selected features in the order in which they were selected in the SFS process. The development of the classification accuracy with each step in the SFS process is plotted with function plot_selection_accuracy (Fig. 3). In the case of the White Stork dataset, we can see that after the sixth selected feature, “z_variance”, there is almost no further improvement in classification accuracy with the addition of more features.