2.2 Environmental variables acquisition and filtering
The spatial distribution of species can be significantly influenced by
the surrounding habitat environment (Zhao et al., 2023). In this study,
we obtained raster data of variable projections from the comprehensive
dataset Bio-ORACLE
(http://www.bio-oracle.org) and
Global Marine Environment Datasets (http://gmed.aucklan-d.ac.nz),
considering their biological relevance and data availability. To assess
collinearity among candidate predictor variables, scatter collinearity
analysis, Pearson’s correlation coefficients, and the variation
inflation factor (VIF) were employed (Zuur et al., 2010). Variables with
correlation values below 0.8 and VIF values below 10 were retained.
Nine
predictors were considered based on the collinearity analysis, including
mean current velocity, mean salinity, mean temperature, temperature
range, mean dissolved oxygen, light at bottom, phytoplankton, water
depth, and land distance (Table 1, Fig. 2).
Bio-ORACLE provides future environmental projections for two time
periods (2040-2025, 2090-2100) using three atmospheric-ocean general
circulation models (AOGCMs: CCSM4, HadGEM2-ES and MIROC5) under four
Representative Concentration Pathways (RCPs) emission scenarios: RCP
2.6, RCP 4.5, RCP 6.0 and RCP8.5
(Assis
et al., 2018). To reduce uncertainty in environmental projections, we
averaged the predictions from the three AOGCMs to represent future
climate conditions. RCP 2.6 indicates an optimistic emission level with
well-controlled greenhouse gas concentration, while RCP 4.5 and RCP 6.0
represent moderate emission levels. RCP8.5 represents a pessimistic
scenario without controlled emissions (Moss et al., 2010). For
comparability, we used two RCPs (RCP 2.6 and RCP 8.5) to predict future
distributions for the 2050s (2040–2050) and 2100s (2090–2100). Future
environmental factors, including current velocity, mean salinity, mean
temperature, temperature range, are downloaded from Bio-ORACLE, while
the other factors were assumed to remain unchanged in the future (Zhang
et al., 2021).
2.3
Model construction and assessment
The modeling program used in this study is SDM based on the biomod2
package in the R platform (Thuiller et al., 2020). The package includes
ten modeling algorithms: maximum entropy (Maxent), random forest (RF),
surface extent envelope
(SRE),
multiple adaptive regression splines (MARS), artificial neural network
(Moss et al.), flexible discriminant analysis (FDA), classification tree
analysis (CTA), generalized boosting model (GBM), generalized linear
model (GLM), and generalized additive model (GAM). Species distribution
modeling and prediction rely on existing/pseudo-missing records and
current environmental data. Due to limited actual sampling point data
for most species, pseudo-distribution data is used as a substitute to
overcome this limitation (Barbet-Massin et al., 2012). The
pseudo-absences function in the R package MOPA is employed to randomly
simulate an equal number of pseudo-absent records and compare them with
the conditions of actual presence points, thereby improving the model’s
predictive performance (Guisan et al., 2017). To evaluate the accuracy
of the model predictions, a five-fold cross-validation method with 10
replicates was used (Fu et al., 2021). In this methodology, 80% of the
dataset is randomly selected for model calibration and testing, while
the remaining 20% is reserved for assessing model predictions.
Two model evaluation indices embedded in biomod2, namely true skill
statistics (TSS) and the area under the receiver operating
characteristic curve (AUC), are calculated to estimate predictive
accuracy. Given the potential diversity in outcomes generated by the ten
models, an ensemble projecting approach is adopted to reduce uncertainty
and enhance reliability (Buisson et al., 2010; Morato et al., 2020). To
improve the accuracy of ensemble models, only models with an AUC greater
than 0.8 and a TSS greater than 0.7 are retained, indicating high
predicted accuracy and low uncertainty (Allouche et al., 2010; Mei et
al.,
2017).
Species-level and population-level modeling analyses are conducted,
drawing species response curves for each environmental variable using
the generated SDMs to visualize the variation in species occurrence
probability along the environmental gradient. The significance of each
predictor is estimated using a randomly permuted method (Guisan et al.,
2017). The potential distributions of the entire species and the two
populations (EIOS and WPI) under present and future climate scenarios
(2050s, 2100s) for RCP 2.6 and RCP 8.5, respectively, were predicted
using the ensemble models at both species and population levels.
Continuous habitat suitability maps are created based on the direct
outputs from the ensemble models. Using automatically derived thresholds
that maximize the TSS values of the ensemble model, we translated the
continuous prediction into binary values for a clearer comprehension of
habitat appropriateness (Liu et al., 2013).