3.2 Genome assembly and completeness of the assembled genome
The PacBio Sequel II platform generated 219.37 Gb (156.92-fold coverage) long sequencing reads, which correspond to 18,433,125 subreads. Of all the subreads, 16,799,937 subreads were larger than 2 Kb in length. The maximum length, mean length, N50, N90, and GC content of the long PacBio reads were 264,033 bp, 11,900 bp, 17,148 bp, 6,872 bp, and 43.60%, respectively. The long reads were assembled into 27,518 contigs, which correspond to 1,507,921,137 bp. Among all the contigs, 27,517 contigs were larger than 2 Kb in length. The maximum length, N50, N90, and GC content of the assembled contigs were 1,759,181 bp, 107,376 bp, 23,463 bp, and 40.90%, respectively. After the contigs were polished with short Illumina reads and long PacBio reads, 27,223 corrected contigs (1,529,349,659 bp) were obtained, and 27,220 contigs were larger than 2 Kb in length. The maximum length, N50, N90, and GC content of the assembled corrected contigs were 1,758,737 bp, 109,775 bp, 24,031 bp, and 41.20%, respectively. The genome was subsequently processed to remove redundancies and produced a 1429.38 Mb genome with a contig N50, N90, and GC content of 118,954 bp, 30,162 bp, and 41.30%, respectively. Approximately 240.85 Gb clean Hi-C data were generated, and 91.42% of the contigs were anchored to 51 chromosomes to assemble the genome into chromosome level. The final chromosome-level C. japonica genome after Hi-C data-assisted assembly was 1431.02 Mb with a contig N50 size of 29.67 Mb (Figure.2).
The short Illumina reads and long PacBio reads were further compared with the assembledC. japonica genome sequences to assess the consistency of sequences. A comparison of the results showed that 94.11% and 87.67% of short Illumina reads and long PacBio reads were successfully mapped to the assembled genome, respectively (Table.1). Based on Arthropoda gene set, 86.40% complete BUSCOs were found in the C. japonica genome, including 82.27% of the complete and single-copy BUSCOs and 4.13% complete and duplicated BUSCOs (Table.2).