2.8. Genome-resequencing and SNP calling
For genome resequencing, we sampled a total of 54 individuals ofP. leopardus from two farming factories in Hainan Province of China. Genomic DNA were extracted from fin tissues of each fish. Pair-ended libraries were constructed according to the standard protocol (Illumina, USA), with an insert size of 300 bp. The sequencing was conducted on the Illumina HiSeq 2000 platform. To avoid the potential influence of low-quality reads in the subsequent analysis, raw reads were checked and filtered using QC-Chain (Zhou et al., 2013), removing reads in the following types: (1) reads containing > 10% unidentified nucleotides (N’s); (2) duplicated reads; (3) reads aligned to adapters, and (4) reads with 10% bases having quality score < 20.
The quality-filtered reads were mapped to the genome assembly using BWA software with default parameters (Li & Durbin, 2009). SNP calling were then performed on a population-scale using GATK (McKenna et al., 2010). The allele frequencies were calculated using VCFtools. We further filtered the SNPs, and only SNPs satisfying the criteria of quality of depth > 2.0, mapping quality > 40, SNP quality > 30, minor allele frequency (MAF) ≥ 0.05 and missing rate ≤ 0.1 were kept in the final SNP set.
Results and discussion
3.1 Sequencing and genome size estimation
Genomic DNA of a P. leopardus individual at around 1 year-old (0.5 kg) (Figure 1) , which was provided by Mingbo Aquatic Company (Laizhou, China), was used for genome sequencing. A 10× Genomics linked-read library was constructed and sequenced on BGISEQ-500 platform (BGI, Shenzhen, China), producing a total of 152.61 Gb of raw reads. After quality filtering, we obtained 127.36 Gb clean data (Table 1 ). For the Hi-C sequencing, we obtained 631,842,593 raw read pairs, amounting to 126.37 Gb Hi-C data. Quality control on the Hi-C data finally resulted to 10.60% of the total raw reads as valid Hi-C reads, with two ends mapped to different contigs, which are useful for Hi-C scaffolding (Table 1 ). Single molecule sequencing with PacBio technology generated 1,255,828 reads for 18.05 Gb (Table 1 ).