2.5 Gene annotation
To predict the repetitive regions,
RepeatMasker (version 4.1.1)
(Tarailo-Graovac & Chen, 2009) was employed to screen
the S. chinensis genome
against the Repbase library (Bao, Kojima, & Kurtz, 2015), and the
parameter was set to RepeatMasker -pa 4 -e ncbi -species Hemiptera ch
-dir. Further, an aphid- specific database was generated using
RepeatModeler (version 2.0.1, with default parameters), so as to predict
the transposons and repetitive regions (Flynn et al., 2020). Statistical
results of RepeatMasker and Repeatmodeler analyses were combined.
Gene structures were predicted using GETA pipeline (version 2.4.2,
https://github.com/chenlianfu/geta) to merge the results of the RNA-seq
assisted, homology-based and ab initio methods. Briefly, In the RNA-seq
assisted method, RNA-seq data generated from Illumina were aligned to
the assembled S. chinensis genome using Hisat2 (version 2.1.0.5)
(Kim et al., 2015). In the homology-based method, genes were predicted
based on homology to map protein sequences using GeneWise (version
2.4.1) (Birney, Michele, & Durbin, 2004). Augustus (version 2.5.5)
(Stanke et al., 2006) was used to generate ab initio gene prediction
(Stanke et al., 2006; Blanco, Parra & Guigó, 2007). Gene prediction
results were then pooled and screened against the PFAM database.
To assign functions to the newly annotated genes in the S.
chinensis genome, these genes were aligned to sequences in databases
including NCBI Non-Redundant Protein Sequence (Nr), Non-Redundant
Nucleotide Sequence Database (Nt), SwissProt, Cluster of Orthologous
Groups for eukaryotic complete genomes (KOG), Integrated Resource of
Protein Domains and Functional Sites (InterPro), Gene Ontology (GO),
Kyoto Encyclopedia of Genes and Genomes, Orthology database (KEGG), and
evolutionary genealogy of genes: Non-supervised Orthologous Groups
(eggNOG). A localBlast2GO database
was also built for GO annotation, which was later processed via Blast2GO
(version 2.5). The KAAS of KEGG
databases were utilized to annotate the S. chinensis genome
sequence, and then BBH pattern was chosen.