Environmental variables
Altitude and a set of 19 bioclimatic variables were adopted in our research (Table 1). These bioclimatic variables were derived from the monthly meteorological data(Fick & Hijmans, 2017). They could be clustered into four groups which represented annual trends, seasonality and extreme or limiting environmental factors(van Zonneveld, Castaneda, Scheldeman, van Etten, & Van Damme, 2014). All these variables were at 5 arc-minutes spatial resolution (~10 km×10 km) and with the range of 180°W to 180°E longitude and 60°S to 84°N latitude.
The altitude data were obtained from CGIAR-CSI (available athttps://srtm.csi.cgiar.org/). The current global bioclimatic variables data were download from CHELSA (available at http://chelsa-climate.org/). They were the average values for the period 1979-2013(Karger et al., 2017). The future ones were derived from the projections of eight global climate models (GCMs): BCC-CSM2-MR, CNRM-CM6-1, CNRM-ESM2-1, CanESM5, IPSL-CM6A-LR, MIROC-ES2L, MIROC6, MRI-ESM2-0. These GCMs were selected according to their data availability and involvement in relevant model intercomparison projects within the CMIP6(Moseid et al., 2020). The future climate data were 20-year averages for 2021-2040, 241-2060, 2061-2080 and 2081-2100 and for four Shared Socio-economic Pathways (SSPs): 126, 245, 370 and 585. These data were downloaded from WorldClim (available at http://www.worldclim.org). In order to reduce the bias in certain area from sole GCM, multi-model ensemble (MME) was adopted to derived the average values for future climates(Mendlik & Gobiet, 2016; Pierce, Barnett, Santer, & Gleckler, 2009). Since collinearity-correlation among variables would result in accuracy drop of SDMs, total seven variables (shown in bold in Table 1) were picked out from 20 environmental variables according to multiple correlation coefficient (|R| < 0.6) and variance inflation factor (VIF < 10) between each other(Naimi & Araújo, 2016).
Modelling procedure
Biomod2 was a R package which was developed for species distribution modelling. It included 10 different species distribution algorithms, such as Artificial Neural Network (ANN), Classification Tree Analysis (CTA), Flexible Discriminant Analysis (FDA), Generalized Additive Model (GAM), Generalized Boosting Model (GBM), Generalized Linear Model (GLM), Multiple Adaptive Regression Splines (MARS), Maximum Entropy (MAXENT), Random Forests (RF) and Surface Range Envelop (SRE)(Wilfried Thuiller, Georges, Engler, & Breiner, 2016). Since absence records requirement of several algorithms above, 10000 pseudo-absences were selected randomly 5 times to follow a common strategy(Antoine Guisan, Thuiller, & Zimmermann, 2017; Merow, Smith, & Silander, 2013). 80% of presence and pseudo-absence data were used to calibrate the models, and the rest was used for model testing(Antoine Guisan et al., 2017). The model calibrations and evaluations would repeat 10 times. Response curves and relative contributions of each variable involved were calculated. AUC (area under the receiver operating characteristic curve) and TSS (true skill statistics) were employed as performance evaluation criteria for 10 different algorithms. The models whose AUC was greater than 0.90 and TSS was greater than 0.6 were considered to be with good performance in species distribution modelling(Allouche, Tsoar, & Kadmon, 2006; Swets, 1988). By means of biomod2, an ensemble modelling approach were adopted to build the ensemble models for eliminating model bias caused by model selection. Using these ensemble models, the potential distributions under current and future climate scenarios were projected for A. annua . And finally, based on these projections, distribution patterns, range sizes and shifts under different climate scenarios were analyzed and compared for A. annua .