3.2 Genome assembly and completeness of the assembled genome
The PacBio Sequel II platform generated 219.37 Gb (156.92-fold coverage)
long sequencing reads, which correspond to 18,433,125 subreads. Of all
the subreads, 16,799,937 subreads were larger than 2 Kb in length. The
maximum length, mean length, N50, N90, and GC content of the long PacBio
reads were 264,033 bp, 11,900 bp, 17,148 bp, 6,872 bp, and 43.60%,
respectively. The long reads were assembled into 27,518 contigs, which
correspond to 1,507,921,137 bp. Among all the contigs, 27,517 contigs
were larger than 2 Kb in length. The maximum length, N50, N90, and GC
content of the assembled contigs were 1,759,181 bp, 107,376 bp, 23,463
bp, and 40.90%, respectively. After the contigs were polished with
short Illumina reads and long PacBio reads, 27,223 corrected contigs
(1,529,349,659 bp) were obtained, and 27,220 contigs were larger than 2
Kb in length. The maximum length, N50, N90, and GC content of the
assembled corrected contigs were 1,758,737 bp, 109,775 bp, 24,031 bp,
and 41.20%, respectively. The genome was subsequently processed to
remove redundancies and produced a 1429.38 Mb genome with a contig N50,
N90, and GC content of 118,954 bp, 30,162 bp, and 41.30%, respectively.
Approximately 240.85 Gb clean Hi-C data were generated, and 91.42% of
the contigs were anchored to 51 chromosomes to assemble the genome into
chromosome level. The final chromosome-level C. japonica genome
after Hi-C data-assisted assembly was
1431.02
Mb with a contig N50 size of 29.67 Mb (Figure.2).
The short Illumina reads and long PacBio reads were further compared
with the assembledC.
japonica genome sequences to assess the consistency of sequences. A
comparison of the results showed that 94.11% and 87.67% of short
Illumina reads and long PacBio reads were successfully mapped to the
assembled genome, respectively (Table.1). Based on Arthropoda gene set,
86.40% complete BUSCOs were found in the C. japonica genome,
including 82.27% of the complete and single-copy BUSCOs and 4.13%
complete and duplicated BUSCOs (Table.2).