Genomic distribution of genetic variants
ANNOVAR (Wang, Li, & Hakonarson, 2010) was used to annotate the genomic distribution of variants and classify them into different categories (nonsynonymous, synonymous, UTR, 5 kb upstream, 5 kb downstream, intronic, and intergenic). The filter set for variants was a minor allele frequency of > 0.01, depth of >3, and missing rate of < 0.2.
Population diversity and structure
We used VCFtools (Danecek et al., 2011) to estimate nucleotide diversity (window size of 40 kb) and the F ST divergence statistic (window size of 40 kb) for each pair of populations. We used Plink (Purcell et al., 2007) (http://pngu.mgh.harvard.edu/~purcell/plink/) to investigate the population structure, with ancestry clusters ranging from two to six. Principal component analysis (PCA) was performed using the GCTA software (Yang, Lee, Goddard, & Visscher, 2011). The filtered SNP dataset was used to generate neighbor-joining trees by using Treebest-1.9.2. Mantel tests withF ST/(1−F ST) matrixes and distance matrixes were performed using the ade4 package (Dray & Dufour, 2007) to test for isolation by distance. Geographical distances among samples were measured by following the coastline (coastal distances) and by the shortest distance across open waters (oceanic distances).
Demographic analysis
The demographic history of all five populations was analyzed using the PSMC model as implemented in the PSMC package (Li & Durbin, 2011). In the absence of mutation rates from S. japonica or any closely related species, we used a mutation transition matrix based on medaka data (Spivakov et al., 2014). Point mutation rate and recombination rate per base were assumed to be 2.5 × 10−8. Generation time was calculated as g = a + (s / [1 −s ]), where s is the expected adult survival rate, which is assumed as 80%; and a is sexual maturation age, which is one year for S. japonica . A generation time of 5 years was used. To determine the variance in the estimated effective population size, we performed 100 bootstraps for each population. Population-level admixture analysis was conducted using the TreeMix v.1.12 program (Pickrell & Pritchard, 2012). The program inferred the ML tree for five populations. A window size of 1000 was used to account for linkage disequilibrium (–k) and “–global” to generate the ML tree. Migration events (–m) were sequentially added to the tree.
Screening for signatures of selection
To uncover the genetic variants involved in local adaptation of each population, we calculated the genome-wide distribution ofF ST values and π ratios for four control–thermal groups. These groups included the cold-temperature population RS versus the control groups (ZS and ST), the high-temperature population ST versus the control groups (RS and ZS), the China warm-temperature population ZS versus the control group (RS), and the Japan warm-temperature populations (IB and TB) versus the control group (RS).F ST and π for sliding windows were calculated using VCFtools (Danecek et al., 2011), with a window size of 40 kb and a step size of 20 kb. The windows with the top 5% values for theF ST and π ratios simultaneously served as the candidate outliers under strong selective sweeps. All outlier windows were assigned to their corresponding SNPs and genes. Overlapping genes within cold-, warm-, and high-temperature groups were selected for further analysis to avoid false positives. The selected genes in GO terms and KEGG pathways were enriched using OmicShare tools (http://www.omicshare.com/tools). Multiple comparisons were corrected using the FDR method (FDR-adjustedP value <0.05).
Distribution prediction of China and Japan populations under climate change
We used maximum entropy (Phillips, Anderson, & Schapire, 2006) (Maxent) model to predict potential distributions of two independent populations (China and Japan) under changing climates. We retrieved species presence records from Zhang et al. (2019) and divided them into China (22 recorders) and Japan (23 recorders) groups. On the basis of the results of Zhang et al. (2019) and the benthic life type of S. japonica , we considered five environment predictors from Bio-ORACLE (http://www.bio-oracle.org), including ocean depth, distance to shore, mean sea benthic temperature, salinity, and current velocity. The five predictors were not highly correlated (i.e., pairwise Pearson’s correlation coefficient |r | < 0.70) and thus all were used to develop Maxent models. Future (i.e., RCP 4.5 in 2100s [average for 2090 to 2100]) marine environmental predictors, including temperature, salinity, and current velocity, were also downloaded from Bio-ORALCE. We assumed that ocean depth and distance to shore would remain unchanged in the future. For each group, we developed Maxent model using all presence data of this group and present-day environmental predictors; this model was further used to predict potential distributions under present-day and future climatic conditions.