Comparison of Coleoptera Genome Structure and Synteny
To test for structural genomic differences in N. riversi and other beetles, we compared estimates of genome size, assembly fragmentation, intron length, and transposable element (TE) repeat content. For genome size, we compared N. riversi with the estimates for eight other Adephaga species reported in Pflug et al . (2020), coupled with 11 species of Polyphaga that had sufficient read coverage data available from NCBI’s RefSeq database (O’Leary et al. 2016). These 11 reference taxa (characteristics of these beetle genomes can be found in Table 1 ) include:Aethina tumida (Atum), Agrilus planipennis (Apla),Anoplophora glabripennis (Agla), Dendroctonus ponderosae(Dpon), Diabrotica virgifera (Dvir), Leptinotarsa decemlineata (Ldec), Nicrophorus vespilloides (Nves),Onthophagus taurus (Otau), Photinus pyralis (Ppyr),Sitophilus oryzae (Sory), and T. castaneum (Tcas). For all 20 species, several k-mer based methods were used to estimate genome size, including GenomeScope, basic CovEST, and repeat CovEST (Hozza et al. 2015; Ranallo-Benavidez et al. 2020), as well as de novo assembly size as reported by Pfluget al . (2020) and RefSeq. The k-mer count distributions were calculated using Jellyfish v2.2.3 (Marcais & Kingsford 2011). To compare structural genomic features, we focus on N. riversi and the 11 annotated RefSeq genomes. Assembly statistics were obtained directly from RefSeq and the intron sequences were extracted from the latest assembled genomes and gff files (downloaded on July 27, 2020) using GFF_Ex v2.3 (Rastogi & Gupta 2014). Transposable element (TE) repeat content was either retrieved from published genome studies, or estimated from the available assembly using RepeatModeler and RepeatMasker. Intron sequences were used to calculate the intron length distribution, and then compared to those values reported in RefSeq annotation results. To test whether N. riversi has smaller introns, we employed Kolmogorov–Smirnov statistic to test whether the cumulative distribution of intron length of N. riversi was less than other Coleopteran species.
To examine the extent of synteny among beetle genomes, we used MCScanX (Wang et al. 2012) to measure the number of collinear genes found along scaffolds in pairwise comparisons. We focused on comparing the N. riversi assembly to the six most contiguous beetle genomes available on RefSeq: Aethina tumida (contig N50 = 299 Kb), Agrilus_planipennis (scaffold N50 = 1.11 Mb), Onthophagus taurus (scaffold N50 = 337 Kb),Photinus pyralis (scaffold N50 = 47.02 Mb), Sitophilus oryzae (scaffold N50 = 2.86 Mb), and Tribolium castaneum(scaffold N50 = 4.46 Mb). We calculated the number of contiguous genes, as well as the proportion of contiguous genes, among the seven genomes. The pheatmap package in R (Kolde & Kolde 2015) was used to generate visualizations of the results.