2.3 Genome assembly
The Illumina paired end reads were used for k-mer analysis to estimate the genome size and heterozygosity with a k-mer length of 17 bases. Specifically, the k-mer number and distribution were calculated based on Jellyfish (version 1.1.10, parameters set to -C, -m 17, -s 10G, -t 80), whereas the genomic information was counted and visualized using GenomeScope (version 2.0, parameters set to 12, 150) (Ranallo-Benavidez, Jaron, & Schatz, 2020, Marcais & Kingsford, 2011). Pacbio sequencing data were used to assemble the draft genome using Wtdbg2 (version 2.5, parameters set to -t 8, -p 21, -S 4, -s 0.05, -g 274m, -L 5000) (Ruan & Li, 2020). Potential sequences from bacteria, fungi and other microorganisms were removed by aligning the genome sequences to the Nt database. Both long and short reads were utilized to correct base errors in the draft genome using NextPolish (Hu, Fang, Su, & Liu, 2019). HaploMerger2 (with default parameters) and purge_haplotigs (parameters set to -m 4G; -t 60; -l value1, -m value2, -h value3; -t 60, -a 70) were adopted to remove the heterozygous regions in the genome (Huang, Kang, & Xu, 2017, Roach, Schmidt, & Borneman, 2018).
To construct the chromosome-level genome assembly, Hi-C sequences were aligned to the haploid genome assembly using Juicer (version 1.5, with default parameters). An initial chromosome-level assembly was generated via the 3D de novo assembly (3D-DNA) (version 180114) analysis with the parameter “-r 3” (Dudchenko et al., 2017). The final chromosome-level assembly was reviewed using Juicebox Assembly Tools (JBAT, version 1.11.0, with default parameters) (Dudchenko et al., 2018). The completeness of genome assembly was assessed using BUSCO (v5.1.3) (Waterhouse et al., 2018) to scan the universal single-copy orthologous genes selected from Eukaryota, Arthropoda, Insecta and Hemiptera datasets (odb_10). The final assembly was validated based on the Illumina reads and RNA sequencing (RNA-seq) reads via bowtie2 (Table S1).
2.4 Localization of the sex chromosomes and autosomes
The mapped reads per million (MRPM) of each chromosome for female and male Illumina reads were calculated to locate the sex chromosomes and autosomes (Ye et al., 2021). The normalized read counts of the X chromosome are approximately twice higher in females than those in males, because males have only one copy of the X chromosome, while female have two copies. Both males and females have two copies in the autosomes, and the ratio of males and females is expected to approach 1 (Pal & Vicoso, 2015). Male and female DNA reads were mapped separately to the genomic scaffolds using Bowtie2 with default parameters (Langmead & Salzberg, 2012). The resulting alignments were later filtered to remove the low-quality mapped reads via SAMtools view (-b -q 30). The read counts of each chromosome were calculated using SAMtools idxstats (Li et al., 2009). The sex chromosomes were then verified by comparison with other species. Syntenic blocks of genes were identified between the chromosome-level genome assemblies ofS. chinensis , Acyrthosiphon pisum , Rhopalosiphum maidis , E. lanigerum by adopting MCSCANX and visualization via Dual Systeny Plotter for MCSCANX of the synteny visualization of TBtools (version 1.09, Chen et al., 2020) (Table S1).