2.3 Genome assembly and quality assessment
We estimated the genome size and heterozygosity of the P.
leopardus genome by k-mer analysis, using the quality-filtered reads.
The k-mer counts frequencies was computed with Jellyfish (v2.2.10)
(Marcais & Kingsford, 2011) using k = 17 and a maximum kmer count of
10,000. The k-mer distribution was measured and plotted using
GenomeScope (Vurture et al., 2017). The genome size was calculated with
the formula G = N17-mer /D17-mer, where
the N17-mer is the total number of 17-mers, and
D17-mer denotes the peak frequency of 17-mers.
We de novo assembled the 10× Genomics short reads into contigs
and scaffolds using Supernova (v1.2) (Weisenfeld et al., 2017) . Gaps in
the initial assembly were filled with Gapcloser (v1.12) (Luo et al.,
2012) with the parameters of “avg_ins=364, max_ins=500 and
min_ins=260”. The draft assembly was then anchored and oriented into a
chromosomal-scale assembly using the Hi-C scaffolding approach. Firstly,
the raw Hi-C reads were filtered with HiC-Pro (v2.8.0) (Servant et al.,
2015). Then 3d-dna (v170123) (Dudchenko et al., 2017) with parameters of
“-m haploid -s 0 -c 24” was used to anchor the primary contigs and
scaffolds into chromosomes. The inter / intra-chromosomal contact maps
were built and visualized with Juicebox (Durand et al., 2016).
To further improve the integrity and accuracy of the genome assembly, we
employed TGS-GapCloser, which uses low depth (≥ 10×) single molecule
sequencing long reads without any error correction to close gaps in the
draft assembly (Xu et al., 2019). The long sequences were split into
three groups, including total reads (with options –min_idy 0.2,
–min_match 200 –r_round 1), reads with length ≥ 20 kb (with
options –min_idy 0 –min_match 0 –r_round 3) and reads with
length in 2-20kb (with options –min_idy 0 –min_match 0
–r_round 3), and each group were used to fill the corresponding
aligned gaps.
The completeness of the genome assembly were assessed by Benchmarking
Universal Single-Copy Orthologs (BUSCO) (Waterhouse et al., 2017) and GC
content analyses. The single copy orthologues of actinopterygii_obd9
(BUSCO, v2.0) were searched against the assembled genome using BUSCO
tool. The GC content and average sequencing depth across the genome were
also measured with 10 Kb non-overlapping sliding windows and the windows
harboring more than 50% N’s were filtered. No external contamination
was found in the genome.