Genome Assembly, Annotation, Structural Comparison and
Synteny
The draft long-read CANU assembly resulted in a genome size of 189 Mb,
comprised of 2,098 contigs. However, a proportion of the genome was
identified as non-target sequence (Figure S 2, Table S3and Table S4 ), principally as Proteobacteria and Tenericutes,
and removal of these reads as well as mitochondrial reads reduced the
long-read dataset by 44.6 Mb. After non-target sequence removal, a
hybrid assembly was generated using both the long-read and short-read
data. Different long-read coverage thresholds were explored (10x, 20x,
and 25x), with genome completeness assessed using BUSCO. The 20x
coverage assembly performed best according to BUSCO, with 95.0%
completeness (complete and single copy = 94.8%, complete and duplicated
= 0.2%, fragmented = 2.2%, and missing = 2.8%, out of 2,442 single
copy orthologs), and resulted in 2,137 contigs summing to 147.3 Mb in
total size (average length 69.0 Kb and max length 1.28 Mb). Alternative
haslr assemblies resulted in a reduction of BUSCO completeness
(93.6% for 10x and 94.8% for 25x), with increases in the missing
single copy orthologs. The 20x coverage genome was then scaffolded using
RNAseq pair-end read data, which resulted in 1,636 scaffolds and 147.4
Mb genome size (average scaffold length 90.1 Kb and max length 1.37 Mb).
This final genome assembly had an improved BUSCO score with 95.3%
completeness (complete and single copy = 95.1%, complete and duplicated
= 0.2%, fragmented = 2.1%, and missing = 2.6%).
The genome was then annotated using automated prediction, database
searches and RNAseq data. This resulted in 17,895 genes and 23,973
proteins (Table 2 ), as well as 150,583 repeat regions
(comprising 201,059 copies in 19 families, Table S5 ). The
official gene set has a 93.3% BUSCO completeness score (complete and
single copy = 80.5%, complete and duplicated = 12.8%, fragmented =
2.4%, and missing = 4.3%). The genome size of N. riversi is
smaller and less fragmented than the 11 Coleoptera species in
RefSeq (Table 1; Table S6; Figure S3 ), yet the number
of genes falls within the range of published gene sets. Similarly, the
repeat content of N. riversi is falls within the observed range
of other Coleoptera species, albeit at the low end (Table 1 ).
On exceptional difference involves the distribution of intron length inN. riversi , which is truncated compared to other beetle species
in having a statistically significant reduction of larger introns
(> 1,000 bp) and consequently smaller total size of
intronic regions (Figure S4 ). Finally, an analysis of collinear
genes across N. riversi and six other beetle genomes
(Figure S5 ) shows that estimates of the number of collinear
genes and the proportion of collinear genes are impacted by genome
fragmentation. Focusing only on comparisons involving T.
castaneum (the most contiguous assembly), modest synteny is found
across Coleoptera, with roughly 2000 genes (6% of the total) showing
collinearity in N. riversi .