2.3 Modeling ecological niches
We use the Jackknife cross-validation procedure for improving the
accuracy and to minimize the variability of the models in taxa with
fewer than 25 localities. In this method, one presence record is
excluded randomly from modeling and the procedure is repeated as many
times as the data allow; all records are excluded just once. In each
iteration, the n -1 data are used as model training information
and the excluded record is used for model evaluation (Shcheglovitova and
Anderson 2013). Finally, in species with more than 25 occurrence
records, we applied the get.checkerboard1 function in the
“ENMeval” package (Muscarella et al. 2014). In this method, the
database is divided into two sets using a checkerboard pattern across
the extent of the study area. Occurrences are separated according to
their position on the board. Finally, we defined the set with more
records as training data and the set with less information as evaluation
data (Brown 2014).
To characterize the ecological niches of sea snakes we used the MaxEnt
3.4.1 algorithm (Phillips et al. 2006). A series of candidate models
were calibrated with the training database of each species via the
“kuenm” package (Cobos et al. 2019). This package allows varying
parameter combinations of the MaxEnt algorithm. We tested combinations
of seven feature classes in MaxEnt (“l”, “q”, “p”, “lq”,
“lqp”, “lp”, and “qp”, where “l” is linear, “q” is quadratic,
and ”p” is product), eight regularization multipliers (0.10, 0.25, 0.50,
0.75, 1, 2, 3 and 4), and the five groups of environmental variables
described above (thus, the maximum possible number of variables used to
build any model is full set of “repository/depth” if no correlation
between all variables were found). This results in 560 candidate models
per species (spp × 2 resolutions × 7 Fs × 8 RMs × 5 groups of
variables). Then we selected using the test database the best subset of
models that met the following criteria hierarchically: 1) statistical
significance, where we retained those models that were better than
expected by chance, depending on the proportion of bootstrap replicates
with ratios of the partial area >1 (Peterson et al. 2008),
2) predictive capacity, where we selected those models with statistical
significance, which were also able to predict at least 90% of the
evaluation records (i.e., models with an omission rate –OR– ≤ 0.10),
and 3) complexity, through the Akaike information criterion corrected
for small samples (ΔAICc) we selected as best combinations those ≤ 2
units, that are also the models that present a better fit and fewer
parameters (Radosavljevic and Anderson 2014; Warren and Seifert 2011).
Based on the best parameter combinations (full set of best combinations
are available in Table SM2), we built a final set of models using the
whole database (training and evaluation data) with the bootstrap
functionality of MaxEnt performing 10 replicates. In each iteration, we
divided presence records randomly into 80% for training and 20% for
evaluation; we define Cloglog as output format and 10,000 background
points (masked by our calibration area). Finally, we calculated the
median and the range of the predicted values across the total replicates
(10 replicates × final parameter combination) to represent the
consistency and variation in predictions.