Data Analysis
ContigExpress was used to stitch and manual correction of double-ended sequencing data. BioEdit was used for sequence alignment and clipping (the front and end 50bp sequences), and the alignment and cropped sequences were used for subsequent data analysis. The sequences of all the samples at matR and nad5-1 markers had been uploaded to CNGBdb (China National GeneBank DataBase). The accession numbers of these two marker sequences were A to B (matR sequences) and C to D (nad5-1 sequences) separately. (Not uploaded yet.)
The genetic diversity parameters including the number of polymorphic (S), number of haplotypes (h), haplotype diversity (Hd), nucleotide diversity (pi), and average number of nucleotide differences(K), indel diversity (k(i)) was calculated by DnaSP (Librado and Rozas, 2009). Analysis of molecular variance (AMOVA), Tajim’s D, Fu’s Fs, and their corresponding P-values was obtained with the Arlequin software (EXCOFFIER and LISCHER, 2010). Statistical Analysis by using SPSS. MEGA X (Sudhir et al., 2018)was used to construct the neighbor-joining tree. GenAIEx 6.5 (Peakall and Smouse, 2012) was used to calculate the genetic distance between all populations. STRUCTURE (Pritchard et al., 2000; Falush et al., 2003; FALUSH et al., 2007; HUBISZ et al., 2009) was used to determine the best subpopulations of all populations. A burn-in period of 100,000 and 100,000 MCMC repetitions after burn-in was set with an admixture model. The K value ranges from 1 to 10, and 10 independent operations were carried out for each K value. Then, Structure Harvester (Evanno et al., 2005; Earl and Vonholdt, 2012), CLUMPP v1.1(Rosenberg, 2007) and DISTRUCT (Rosenberg, 2004) were used successively to determine the best K, align clustering results and draw visualization results.
The six conserved domain fragments (https://meme-suite.org/meme/tools/meme) of the two marker sites were extracted and integrated to simplify the number of haplotypes. PopART (Leigh and Bryant, 2015) was used to construct the TCS network.