2.4 Spatial prediction using LANDMAP
To determine the best model, we applied the Super Learner ensemble algorithm as implemented in the LANDMAP package v0.0.14 for R v4.1.0, which provides a strategy for automated mapping by performing spatial prediction using raster data as predictors (Hengl et al., 2018, 2021; Polley & Laan, 2010; RStudio Team, 2021) (https://github.com/Envirometrix/landmap). The Super Learner ensemble ML algorithm developed by Polley & Laan (2010), estimates the performance of multiple ML models by using cross-validation. It develops an ensemble of the optimal weighted averages from the models using the test data performance (van der Laan et al., 2007). The LANDMAP package has 41 different predictive algorithms available. The methods implemented in the model ensemble were decision trees-based methods (random forest), kernel-based methods (support vector machines), methods based on neural networks, and generalized linear models. We assumed that different methods describe relationships in our data in a different manner.
We took advantage of the geographical distances in our training data and used the oblique geographic coordinates technique to assume there is no collinearity between covariates, as used by previous studies (Møller et al., 2020). We expressed the uncertainty of our estimates in percentage form as the range of the 68% prediction intervals divided by their mean prediction for each pixel, as performed by Viscarra Rossel et al. (2014). We used a 5-fold spatial cross validation (spCV) approach to assess the predictive accuracy of our modeling framework (Brenning, 2012; James et al., 2013b; Wadoux et al., 2021). The spCV yields model independent residuals required to compute map quality indicators such as: the coefficient of determination (r2) and root mean square error (RMSE). To compare model accuracy among different forest types we used Taylor diagrams (Wadoux et al., 2022).