Climatic Variables and Niche Modeling
To generate the distribution
predictions for the three canid species, we gathered 19 climatic
variables (Hijmans et al., 2005), available for the current, and future
projections of the SSP585 climate scenario from global climate models
(GCM’s: BCC-CSM2-MR; CanESM5; IPSL-CM6A-LR; MIROC-ES2L; and MIROC6),
selected by a cluster analysis (Varela et al., 2015) in the WorldClim
v2.1 database. We cut all variables to the extent of the South American
continent, with a grid resolution of 0.041º at the Equator
(~4 Km or 2.5 arc-min).
We standardized all variables, so they had their means equal to 0 and
their variance equal to 1. Later, we applied a spatialized principal
component analysis (PCA) based on a correlation matrix of the variables
to produce 19 orthogonal/independent principal components (PCs) to be
used as our new environmental variables (De Marco and Nóbrega, 2018). We
selected the first six PCs from those, responsible for
~97% of the original raw climatic variance of the
predictor variables (Appendix, Table 1). These independent variables
allow us to produce more reliable predictions for the species
distributions and avoid model overfitting (Jiménez-Valverde et al.,
2011).
We partitioned the three species occurrences using a checkerboard
pattern (Muscarella et al., 2014; Roberts et al., 2017), with an
aggregation factor of two. According to this partitioning pattern, the
species occurrences are divided into two subsets geographically
structured as a checkerboard table. One of these subsets is used to
train the models on a first run, while the other is used to evaluate the
models. In a second run, these subsets are inverted, and the subset used
to evaluate the models will be used to train the models, and that used
to train the modes will be used to evaluate the models. We consider the
proportion of subsets always 30% of data for testing and 70% for
training.
We considered three different modeling methods to produce our models,
ranging from statistical methods [i.e., Random Forest (RDF)],
machine learning methods [i.e. MaxEnt (MAX) and Support Vector
Machines (SVM)]. We trained the models for the three species
considering the ecoregions shapefile provided by the World Wildlife Fund
website (https://www.worldwildlife.org/biomes), restricting the
climatic variables to ecoregions with known occurrences for the species.
This procedure is very important to delimit the M section of the
Biotic-Abiotic-Migration diagram (Soberón and Peterson 2005; Soberón
2007; Barve et al. 2011; Saupe et al. 2012) and improve the model
predictions for the focus species.
We made use of pseudo-absences to train the models. Therefore, we
restricted their allocation in the geographic space after they were
environmentally restricted, considering an environmental space based on
a climatic environment (VanDerWal et al. 2009; Lobo and Tognelli 2011).
To evaluate our models, we considered the Jaccard´s similarity index
proposed by Leroy et al. (2018), which varies from 0 to 1, where around
zero values indicate no-better-than-random predictions, while values
higher than 0.5 are considered as acceptable predictions. Differently
from other metrics commonly used in distribution modelling studies
[e.g., area under the curve (Fielding and Bell, 1997) or true skill
statistic (Allouche et al., 2006)], the Jaccard index is both
prevalence and pseudo-absence independent, yielding less uncertain
metrics for the models.
To cut the suitability matrices into presence/absence matrices, we
considered the threshold that maximized the Jaccard index. Finally, we
used a mean ensemble weighted by the best Jaccard values in each
modeling method in order to produce the species final distribution range
(Figure 2). We perform the modeling by the package ENMTML (De Andrade et
al., 2020) in software R (v4.0).