2.8. Genome-resequencing and SNP calling
For genome resequencing, we sampled a total of 54 individuals ofP. leopardus from two farming factories in Hainan Province of
China. Genomic DNA were extracted from fin tissues of each fish.
Pair-ended libraries were constructed according to the standard protocol
(Illumina, USA), with an insert size of 300 bp. The sequencing was
conducted on the Illumina HiSeq 2000 platform. To avoid the potential
influence of low-quality reads in the subsequent analysis, raw reads
were checked and filtered using QC-Chain (Zhou et al., 2013), removing
reads in the following types: (1) reads containing > 10%
unidentified nucleotides (N’s); (2) duplicated reads; (3) reads aligned
to adapters, and (4) reads with 10% bases having quality score
< 20.
The quality-filtered reads were mapped to the genome assembly using BWA
software with default parameters (Li & Durbin, 2009). SNP calling were
then performed on a population-scale using GATK (McKenna et al., 2010).
The allele frequencies were calculated using VCFtools. We further
filtered the SNPs, and only SNPs satisfying the criteria of quality of
depth > 2.0, mapping quality > 40, SNP quality
> 30, minor allele frequency (MAF) ≥ 0.05 and missing rate
≤ 0.1 were kept in the final SNP set.
Results and discussion
3.1 Sequencing and genome size estimation
Genomic DNA of a P. leopardus individual at around 1 year-old
(0.5 kg) (Figure 1) , which was provided by Mingbo Aquatic
Company (Laizhou, China), was used for genome sequencing. A 10× Genomics
linked-read library was constructed and sequenced on BGISEQ-500 platform
(BGI, Shenzhen, China), producing a total of 152.61 Gb of raw reads.
After quality filtering, we obtained 127.36 Gb clean data (Table
1 ). For the Hi-C sequencing, we obtained 631,842,593 raw read pairs,
amounting to 126.37 Gb Hi-C data. Quality control on the Hi-C data
finally resulted to 10.60% of the total raw reads as valid Hi-C reads,
with two ends mapped to different contigs, which are useful for Hi-C
scaffolding (Table 1 ). Single molecule sequencing with PacBio
technology generated 1,255,828 reads for 18.05 Gb (Table 1 ).