Climatic Variables and Niche Modeling
To generate the distribution predictions for the three canid species, we gathered 19 climatic variables (Hijmans et al., 2005), available for the current, and future projections of the SSP585 climate scenario from global climate models (GCM’s: BCC-CSM2-MR; CanESM5; IPSL-CM6A-LR; MIROC-ES2L; and MIROC6), selected by a cluster analysis (Varela et al., 2015) in the WorldClim v2.1 database. We cut all variables to the extent of the South American continent, with a grid resolution of 0.041º at the Equator (~4 Km or 2.5 arc-min).
We standardized all variables, so they had their means equal to 0 and their variance equal to 1. Later, we applied a spatialized principal component analysis (PCA) based on a correlation matrix of the variables to produce 19 orthogonal/independent principal components (PCs) to be used as our new environmental variables (De Marco and Nóbrega, 2018). We selected the first six PCs from those, responsible for ~97% of the original raw climatic variance of the predictor variables (Appendix, Table 1). These independent variables allow us to produce more reliable predictions for the species distributions and avoid model overfitting (Jiménez-Valverde et al., 2011).
We partitioned the three species occurrences using a checkerboard pattern (Muscarella et al., 2014; Roberts et al., 2017), with an aggregation factor of two. According to this partitioning pattern, the species occurrences are divided into two subsets geographically structured as a checkerboard table. One of these subsets is used to train the models on a first run, while the other is used to evaluate the models. In a second run, these subsets are inverted, and the subset used to evaluate the models will be used to train the models, and that used to train the modes will be used to evaluate the models. We consider the proportion of subsets always 30% of data for testing and 70% for training.
We considered three different modeling methods to produce our models, ranging from statistical methods [i.e., Random Forest (RDF)], machine learning methods [i.e. MaxEnt (MAX) and Support Vector Machines (SVM)]. We trained the models for the three species considering the ecoregions shapefile provided by the World Wildlife Fund website (https://www.worldwildlife.org/biomes), restricting the climatic variables to ecoregions with known occurrences for the species. This procedure is very important to delimit the M section of the Biotic-Abiotic-Migration diagram (Soberón and Peterson 2005; Soberón 2007; Barve et al. 2011; Saupe et al. 2012) and improve the model predictions for the focus species.
We made use of pseudo-absences to train the models. Therefore, we restricted their allocation in the geographic space after they were environmentally restricted, considering an environmental space based on a climatic environment (VanDerWal et al. 2009; Lobo and Tognelli 2011).
To evaluate our models, we considered the Jaccard´s similarity index proposed by Leroy et al. (2018), which varies from 0 to 1, where around zero values indicate no-better-than-random predictions, while values higher than 0.5 are considered as acceptable predictions. Differently from other metrics commonly used in distribution modelling studies [e.g., area under the curve (Fielding and Bell, 1997) or true skill statistic (Allouche et al., 2006)], the Jaccard index is both prevalence and pseudo-absence independent, yielding less uncertain metrics for the models.
To cut the suitability matrices into presence/absence matrices, we considered the threshold that maximized the Jaccard index. Finally, we used a mean ensemble weighted by the best Jaccard values in each modeling method in order to produce the species final distribution range (Figure 2). We perform the modeling by the package ENMTML (De Andrade et al., 2020) in software R (v4.0).