2.3 Modeling ecological niches
We use the Jackknife cross-validation procedure for improving the accuracy and to minimize the variability of the models in taxa with fewer than 25 localities. In this method, one presence record is excluded randomly from modeling and the procedure is repeated as many times as the data allow; all records are excluded just once. In each iteration, the n -1 data are used as model training information and the excluded record is used for model evaluation (Shcheglovitova and Anderson 2013). Finally, in species with more than 25 occurrence records, we applied the get.checkerboard1 function in the “ENMeval” package (Muscarella et al. 2014). In this method, the database is divided into two sets using a checkerboard pattern across the extent of the study area. Occurrences are separated according to their position on the board. Finally, we defined the set with more records as training data and the set with less information as evaluation data (Brown 2014).
To characterize the ecological niches of sea snakes we used the MaxEnt 3.4.1 algorithm (Phillips et al. 2006). A series of candidate models were calibrated with the training database of each species via the “kuenm” package (Cobos​ et al. 2019). This package allows varying parameter combinations of the MaxEnt algorithm. We tested combinations of seven feature classes in MaxEnt (“l”, “q”, “p”, “lq”, “lqp”, “lp”, and “qp”, where “l” is linear, “q” is quadratic, and ”p” is product), eight regularization multipliers (0.10, 0.25, 0.50, 0.75, 1, 2, 3 and 4), and the five groups of environmental variables described above (thus, the maximum possible number of variables used to build any model is full set of “repository/depth” if no correlation between all variables were found). This results in 560 candidate models per species (spp × 2 resolutions × 7 Fs × 8 RMs × 5 groups of variables). Then we selected using the test database the best subset of models that met the following criteria hierarchically: 1) statistical significance, where we retained those models that were better than expected by chance, depending on the proportion of bootstrap replicates with ratios of the partial area >1 (Peterson et al. 2008), 2) predictive capacity, where we selected those models with statistical significance, which were also able to predict at least 90% of the evaluation records (i.e., models with an omission rate –OR– ≤ 0.10), and 3) complexity, through the Akaike information criterion corrected for small samples (ΔAICc) we selected as best combinations those ≤ 2 units, that are also the models that present a better fit and fewer parameters (Radosavljevic and Anderson 2014; Warren and Seifert 2011). Based on the best parameter combinations (full set of best combinations are available in Table SM2), we built a final set of models using the whole database (training and evaluation data) with the bootstrap functionality of MaxEnt performing 10 replicates. In each iteration, we divided presence records randomly into 80% for training and 20% for evaluation; we define Cloglog as output format and 10,000 background points (masked by our calibration area). Finally, we calculated the median and the range of the predicted values across the total replicates (10 replicates × final parameter combination) to represent the consistency and variation in predictions.