Comparison of Coleoptera Genome Structure and Synteny
To test for structural genomic differences in N. riversi and
other beetles, we compared estimates of genome size, assembly
fragmentation, intron length, and transposable element (TE) repeat
content. For genome size, we compared N. riversi with the
estimates for eight other Adephaga species reported in Pflug et
al . (2020), coupled with 11 species of Polyphaga that had sufficient
read coverage data available from NCBI’s RefSeq database
(O’Leary et al. 2016). These 11 reference taxa (characteristics
of these beetle genomes can be found in Table 1 ) include:Aethina tumida (Atum), Agrilus planipennis (Apla),Anoplophora glabripennis (Agla), Dendroctonus ponderosae(Dpon), Diabrotica virgifera (Dvir), Leptinotarsa
decemlineata (Ldec), Nicrophorus vespilloides (Nves),Onthophagus taurus (Otau), Photinus pyralis (Ppyr),Sitophilus oryzae (Sory), and T. castaneum (Tcas). For all
20 species, several k-mer based methods were used to estimate genome
size, including GenomeScope, basic CovEST, and repeat
CovEST (Hozza et al. 2015; Ranallo-Benavidez et
al. 2020), as well as de novo assembly size as reported by Pfluget al . (2020) and RefSeq. The k-mer count distributions
were calculated using Jellyfish v2.2.3 (Marcais & Kingsford
2011). To compare structural genomic features, we focus on N.
riversi and the 11 annotated RefSeq genomes. Assembly
statistics were obtained directly from RefSeq and the intron
sequences were extracted from the latest assembled genomes and gff files
(downloaded on July 27, 2020) using GFF_Ex v2.3 (Rastogi &
Gupta 2014). Transposable element (TE) repeat content was either
retrieved from published genome studies, or estimated from the available
assembly using RepeatModeler and RepeatMasker. Intron
sequences were used to calculate the intron length distribution, and
then compared to those values reported in RefSeq annotation
results. To test whether N. riversi has smaller introns, we
employed Kolmogorov–Smirnov statistic to test whether the cumulative
distribution of intron length of N. riversi was less than other
Coleopteran species.
To examine the extent of synteny among beetle genomes, we used
MCScanX (Wang et al. 2012) to measure the number of
collinear genes found along scaffolds in pairwise comparisons. We
focused on comparing the N. riversi assembly to the six most
contiguous beetle genomes available on RefSeq: Aethina
tumida (contig N50 = 299 Kb), Agrilus_planipennis (scaffold N50
= 1.11 Mb), Onthophagus taurus (scaffold N50 = 337 Kb),Photinus pyralis (scaffold N50 = 47.02 Mb), Sitophilus
oryzae (scaffold N50 = 2.86 Mb), and Tribolium castaneum(scaffold N50 = 4.46 Mb). We calculated the number of contiguous genes,
as well as the proportion of contiguous genes, among the seven genomes.
The pheatmap package in R (Kolde & Kolde 2015) was used
to generate visualizations of the results.