2.3 | Genome assembly
We constructed a de novo assembly of the St genome of Pse.
libanotica by combining sequences from three different technologies:
Illumina PE150 short-read sequencing, Nanopore long-read sequencing, and
Hi-C conformational alignment.
The clean Nanopore reads after filtering and decontamination were
assembled with wtdbg2. The iterative polishing was conducted using Pilon
(v1.22) in which clean Illumina reads were aligned with the
pre-assembled contigs and BWA with the default parameters (Li & Durbin,
2010; Walker et al., 2014). Further, we combined the final pre-assembled
contig sequences from Nanopore sequencing and clean paired-read data
from Illumina sequencing into scaffolds using SSPACE (v3.0) tool (Bolger
et al., 2014). Genome assembly completeness was assessed using the
plantae database of 1440 single-copy orthologues using BUSCO (v3) with a
BLAST threshold E-value of 1 × 10-5 (Simão et al.,
2015).
The Hi-C libraries were prepared as described previously
(Lieberman-Aiden et al., 2009). Hi-C library sequence used a modified
SNAP read mapper to align the draft input assembly (Zaharia et al.,
2011) (http://snap.cs.berkeley.edu). HiRise was used to analyze the
segregation of Hi-C read pairs mapped within draft scaffolds, and a
likelihood model of the genomic distance between the read pairs was
generated. The model was used to identify and break putative mis-joins,
score prospective joins, and select joins above a threshold.