3.3 Genome annotation
A total of 79,136,004 bp repetitive sequences were obtained in theS. chinensis genome, yielding a repeat percentage of 29% (Table S6). A total of 14,089 (15,987 transcripts) genes were predicted to encode proteins. There were 97.37% of the annotated genes located on the 13 chromosome-level scaffolds (Figure 2B). The average CDS length, exon number per gene, exon length and intron length were 1,536 bp, 73, 212 bp and 910 bp, respectively, similar to those in most of the reported aphid species (Table S7, Figure S2). According to our results, 96.9%, 97.7%, 97.8% and 96.7% of BUSCO genome/gene sets were identified in the S. chinensis genome in comparison with Eukaryota, Arthropod, Hemiptera and Insecta datasets, respectively, demonstrating the completeness of the gene set (Figure 4B). The percentage of RNA-Seq reads assigned to a gene set was up to 80% (Table S3). Among the 14,078 predicted genes, 12,584 (89.32%) were functionally annotated, including 9,272 (65.81%) genes found via GO database and 7,285 (51.71%) genes via KEGG database (Table 2). Non-coding RNAs (ncRNAs) were also identified in the S. chinensisgenome, including 130 tRNAs, 29 rRNAs, 29 miRNAs, and 72 snRNAs (Table S8).
3.4 Phylogenetic analysis
Protein sequences of S. chinensis and eight other closely related species were retrieved from public databases along, B. tabaci as an outgroup. A total of 3479 single copy orthologous groups extracted by OrthoMCL were incorporated to construct the phylogenetic tree. The results showed that S. chinensis was a sister taxon to the wooly apple aphid E. lanigerum . The two Eriosomatinae species diverged from their common ancestor at approximately 57 million years ago (MYA) (Figure 5). Eriosomatinae and Aphidinae (including Ap. glycines , R. maidis , Ac. pisum , M. persicae or D. noxia ) diverged from their common ancestor at about 63 MYA, similar to the previous study (Mather et al., 2020). Compared with the subfamily Chaitophorinae (includingS. flava ) in the family Aphididae, the subfamily Eriosomatinae has a closer relationship with the subfamily Aphidinae. Significant expansion and contraction of gene families is usually related to the adaptive divergence of species. To elucidate the key genomic changes associated with adaptation, expansion and contraction of gene families were analyzed in all the nine aphids and B. tabaci . Eriosomatinae showed 40 expanded and 986 contracted gene families compared with those of the common ancestor of Aphidinae and Eriosomatinae (Figure S4A). KEGG and GO enrichment analyses suggested that most of the expanded genes were involved in the detoxification of natural xenobiotics from plants (Figure S4B, S4C). S. chinensis genome displayed 235 expanded and 1,037 contracted gene families compared with of the common ancestor. KEGG pathway enrichment analysis suggested that most of the expanded gene families were involved in IL-17 signaling pathway, arachidonic acid metabolism, NF-kappa B signaling pathway, ovarian steroidogenesis, VEGF signaling pathway, necroptosis, regulation of lipolysis in adipocyte, TNF signaling pathway, and c-type lectin receptor signaling pathway (Figure S4E). Similarly GO annotation analysis revealed that most of the expanded gene families were involved in prostaglandin-endoperoxide synthase activity, arachidonate 15-lipoxygenase activity, nucleosomes, ovarian cumulus expansion, intrinsic apoptotic signaling pathway in response to osmotic stress, regulation of fever generation, regulation of platelet-derived growth factor production, response to lead ion, and chromatin assembly or disassembly (Figure S4D, Table S9). The expanded gene families of theS. chinensis genome were enriched not only in detoxification but also in the immune system.