Phylogeny and Gene Family Size Analysis
For the gene family expansion/contraction analysis, we first
reconstructed a phylogenomic tree for N. riversi and the 11
beetle species with genomes in RefSeq (O’Leary et al.2016), as well as Drosophila melanogaster (Insecta: Diptera,
hereafter Dmel) as an outgroup for the Coleoptera. The latest genomes
were used to identify single copy orthologs using BUSCO v2.0 with the
Endopterygota reference dataset. More than two thousand single copy
orthologs were discovered from the 13 genomes. We then used MAFFT v7.471
(Katoh & Standley 2013) to align the amino acid (aa) sequences of 13
species for each ortholog, and found 1,017 orthologs had sequences in
all 13 species. In order to reduce any effects of misalignment, we used
trimAl v1.2 (Capella-Gutiérrez et al. 2009) to remove
ambiguous bases and gaps. This trimming process reduced the total
alignment size from 1,021,369 (including gaps) to 382,948 aas. We then
used the R package kdetrees (Weyenberg et al.2014) to detect any gene tree outliers based on both topological
similarity and branch length similarity to all gene trees. After
removing 14 outliers, we concatenated the gene-specific alignments into
a single alignment (1,013 orthologous genes, with 379,463 aas in total)
and constructed a Maximum Likelihood (ML) tree using RAxML
v8.2.12 (Stamatakis 2014). In RAxML, we applied the
PROTGAMMAAUTO substitution model and designated D. melanogasteras the outgroup, and the branch support was evaluated with 100 bootstrap
replicates. The final ultrametric tree was generated by calibrating the
ML tree with fossil records using treePL (Smith & O’Meara
2012), with parameters obtained from the prime function. For fossil
records, we used the estimated age of Tshekardocoleidae (oldest known
beetle, 295 Mya) to define the minimum age of the most recent common
ancestor of all Coleopteran species (Kirejtshuk 2020). The minimum
estimation of crown Coleoptera (208.5 Mya) was used to define the
minimum age of Polyphaga (Wolfe et al. 2016). The oldest
Chrysomelidae fossil (Mesolpinus , 122 Mya) was used to define the
minimum age of Chrysomeloidea (Hu et al. 2020; Kirejtshuket al. 2015), whereas the fossil weevil (Nemonychidae, 157 Mya)
was used to define the maximum age of Chrysomeloidea (Behrensmeyer &
Turner 2013).
With the resulting ultrametric tree, we set out to analyze changes in
the size of gene families in N. riversi relative to other
beetles, while accounting for phylogenetic history, using CAFE v3.0
(Ganote et al. 2018). We first identified all one to one
orthologous genes among the twelve beetle species and Dmel (as outgroup)
using OMA standalone v2.4.1 (Altenhoff et al. 2019). The
input dataset included the precomputed all-against-all genomes of Tcas
and Dpon from the OMA database, and the proteome sequences of the
remaining ten beetle species and Dmel from RefSeq (O’Learyet al. 2016). This resulted in a count matrix of the number of
genes in each hierarchical orthologous group (HOG). Among the 18,438
HOGs, we dropped one HOG (HOG06033) that had more than 200 gene copies
to allow CAFE to find the optimized λ (rate of evolutionary change). We
identified all rapidly expanding and contracting HOGs using a p-value
threshold < 0.05. In addition, we relaxed this threshold to
consider all expanding HOGs on the branch of N. riversi , and
conducted a GSEA test using clusterProfiler (Yu et
al. 2012) in R. We examined two levels of cutoff (0.05 and 0.1) for
Benjamini-Hochberg adjusted p-values to identify enriched GO terms (Yuet al. 2012). The gene ontology terms associated with biological
processes were further clustered based on semantic similarity using the
Revigo webserver (Supek et al. 2011), using thesimRel score to collapse redundant terms at 0.7 similarity and
the UniProt database to determine the gene ontology term
abundance, and visualized using a CirGO plot (Kuznetsovaet al. 2019).