Modelling
An Ensemble model approach was used to predict the distribution of GAS in Kenya and Eastern Africa. Ensemble modelling can generate a more robust model and has been shown to overcome the uncertainties involved with interpreting results from individual models (Araújo & New, 2007; Hao et al., 2019). The ensemble used included ten different modelling techniques: Surface Range Envelope (SRE), Generalized linear models (GLM), Generalized additive models (GAM), Multivariate adaptive regression spline (MARS), Classification tree analysis (CTA), Flexible Discriminant Analysis (FDA), Artificial neural network (ANN), Random forest (RF), Generalized boosting method (GBM) and maximum entropy (MAXENT). All analyses were conducted in R (Team, 2013) using the biomod2 package (Thuiller et al., 2013) and default SDM settings (Appendix 1).
Occurrence data were split randomly and 70% were used as training data for model calibration and the remaining 30% was used to evaluate the model’s predictive performance. The Area under the receiver operating curve (AUC) and True Skill Statistic (TSS) were used to assess the accuracy of the model predictions compared to the validation data.
Ensembles were created using all models for which the validation TSS ≥ 0.4, a value which is considered to signify models with moderate performance (Landis & Koch, 1977). To construct the Ensemble, the mean suitability predicted by all the retained models was calculated, weighted by the accuracy (TSS) of each model.
The importance of the environmental variables on the distribution of the GAS was calculated using all models. For each variable, the variable was randomised and model predictions were made with this shuffled dataset. The Pearson’s correlation (r) was then calculated between the model predictions made with the original and those made with the shuffled data. Importance was calculated as 1-r, so the higher the value, the more influence the variable has on the model.