2.3 | Genome assembly
We constructed a de novo assembly of the St genome of Pse. libanotica by combining sequences from three different technologies: Illumina PE150 short-read sequencing, Nanopore long-read sequencing, and Hi-C conformational alignment.
The clean Nanopore reads after filtering and decontamination were assembled with wtdbg2. The iterative polishing was conducted using Pilon (v1.22) in which clean Illumina reads were aligned with the pre-assembled contigs and BWA with the default parameters (Li & Durbin, 2010; Walker et al., 2014). Further, we combined the final pre-assembled contig sequences from Nanopore sequencing and clean paired-read data from Illumina sequencing into scaffolds using SSPACE (v3.0) tool (Bolger et al., 2014). Genome assembly completeness was assessed using the plantae database of 1440 single-copy orthologues using BUSCO (v3) with a BLAST threshold E-value of 1 × 10-5 (Simão et al., 2015).
The Hi-C libraries were prepared as described previously (Lieberman-Aiden et al., 2009). Hi-C library sequence used a modified SNAP read mapper to align the draft input assembly (Zaharia et al., 2011) (http://snap.cs.berkeley.edu). HiRise was used to analyze the segregation of Hi-C read pairs mapped within draft scaffolds, and a likelihood model of the genomic distance between the read pairs was generated. The model was used to identify and break putative mis-joins, score prospective joins, and select joins above a threshold.