Genome Assembly, Decontamination and Genome Assessment
To assemble the N. riversi genome, we first generated a long-read genome assembly using the software CANU v2.0 (Koren et al. 2017). Default parameters were used, with the estimated genome size set to 230 MB, except the lowCovSpan and lowCovDepth parameters were set to 0.5 and 0, respectively. To error correct the long reads, we polished the genome by first mapping the Illumina short reads to the draft genome using Minimap2 (Li 2018), then the mapped reads (sam files) and raw reads (fastq files) were used to polish the draft genome using Racon (Vaser et al. 2017). This draft long-read assembly was used as a benchmark to assess our final genome assembly, as well as to identify non-targeted sequences. To provide a quantitative assessment of the draft genome completeness, we used BUSCO with the Endopterygota reference gene set. We then used two methods to identify possible non-target sequences in the assembly, BLOBtools v1 (Laetsch & Blaxter 2017), which uses GC content and short-read coverage to help identify foreign DNA, and the sendsketch tool in BBMap v38.86 (Bushnell 2014), which uses a k-mer based approach to match sequences to reference databases. For sendsketch, the number of sequences was set to 100k and the sketch length was set to 200k. We identified the most abundant microbial taxa (Spiroplasma andAcinetobacter ), and used their Genbank reference sequences, as well as the N. brevicollis mitochondrial genome, to filter the long-read data using Minimap2 (Li 2018). The short-read genomic data was also screened with these reference sequences using the bbduk tool in BBMap, and then normalized using bbnorm. With these refined read sets, we generated a final assembly using the hybrid assembler Haslr (Haghshenaset al. 2020). Runs of Haslr were conducted at different settings of long-read coverage (10x, 20x and 25x), and each assembly was then evaluated using BUSCO with the Endopterygota reference gene set. As a final assembly step, we employed pair-end RNAseq data to join contigs using P_RNA_scaffolder (Zhu et al. 2018). BUSCO was run again on this assembly, and we compared the final RNA-scaffolded haslr 20x assembly to the Tribolium castaneum v5.2 RefSeq assembly (GCF_000002335.3) using QUAST v 5.1.0rc1 (Gurevich et al. 2013). The genome assembly has been made publicly available on NCBI (WGS Accession: JADQWA010000000).