3.1 Genome sequencing and de novo assembly
The k-mer (K=17) analysis indicated that the heterozygosity of S. chinensis was 0.786% and the estimated genome size was 274,512,001 bp (Figure S3). The sequencing of the fundatrigenia genome (using the PacBio Sequel II platform) generated 130 Gb raw data with an N50 length of 21,033 bp. The raw contig-level assembly was composed of 304,774,269 bases with 1,409 contigs and the N50 length of 2,961,835 bp (Table 1). After removing the heterozygosity, the length of final contig-level assembly was 271,416,320 bp with 378 contigs, and N50 length of 4,333,385 bp (Table 1).
The chromosome-level genome was generated via Hi-C data (Table S1) with a total length of 271,524,833 bp, with a scaffold of N50 20,405,002 (Table 1). More than 97% of the total genome bases were successfully anchored to the 13 chromosomes (Figure 2). The remaining 2.8% sequences was comprised 341 small scaffolds (Table 1). Chromosome lengths ranged from 14,859,000 bp to 10,104,278 bp. As revealed by BUSCO analyses against the Eukaryota, Arthropoda, Insecta and Hemiptera datasets, theS. chinensis genome assembly contained a higher number of conserved single-copy Arthropoda genes than any other published aphid genome, suggesting the completeness and high quality of our genome assembly (Figure 4A). The genomic short reads were mapped to the assembled genome sequences, resulting in a 97.79% mapping rate and 60 Gb average sequence depth (Table S2). RNA-seq isolated from seven samples including fundatrix, fundatrigeniae, autumn migrants, nymphs, spring migrants (sexuparae), male and female sexuales, a total of 124.22 Gb raw data were generated using the Illumina platform, and more than 86% of the assembled RNA-seq transcripts were mapped to the genome (Table S3). Altogether 260,508 transcripts (280,520,495 bp in total) were produced by Trinity (Table S4).