Modelling
An Ensemble model approach was used to predict the distribution of GAS
in Kenya and Eastern Africa. Ensemble modelling can generate a more
robust model and has been shown to overcome the uncertainties involved
with interpreting results from individual models (Araújo & New, 2007;
Hao et al., 2019). The ensemble used included ten different modelling
techniques: Surface Range Envelope (SRE), Generalized linear models
(GLM), Generalized additive models (GAM), Multivariate adaptive
regression spline (MARS), Classification tree analysis (CTA), Flexible
Discriminant Analysis (FDA), Artificial neural network (ANN), Random
forest (RF), Generalized boosting method (GBM) and maximum entropy
(MAXENT). All analyses were conducted in R (Team, 2013) using the
biomod2 package (Thuiller et al., 2013) and default SDM settings
(Appendix 1).
Occurrence data were split randomly and 70% were used as training data
for model calibration and the remaining 30% was used to evaluate the
model’s predictive performance. The Area under the receiver operating
curve (AUC) and True Skill Statistic (TSS) were used to assess the
accuracy of the model predictions compared to the validation data.
Ensembles were created using all models for which the validation TSS ≥
0.4, a value which is considered to signify models with moderate
performance (Landis & Koch, 1977). To construct the Ensemble, the mean
suitability predicted by all the retained models was calculated,
weighted by the accuracy (TSS) of each model.
The importance of the environmental variables on the distribution of the
GAS was calculated using all models. For each variable, the variable was
randomised and model predictions were made with this shuffled dataset.
The Pearson’s correlation (r) was then calculated between the model
predictions made with the original and those made with the shuffled
data. Importance was calculated as 1-r, so the higher the value, the
more influence the variable has on the model.