2.5 C. japonica genome assembly and evaluation
The Wtdbg2 software (Ruan and Li, 2020) was applied to assemble theC.
japonica genome with PacBio long sequencing reads, and the parameters
as follows: best depth from input reads, 50.0; Kmer psize, 21;
readCutoff, 1k. Although the PacBio long sequencing reads are reliable,
they still need to reach a certain sequencing depth to ensure accuracy.
First, long PacBio reads were applied to polish the consensus sequence
output from the Wtdbg2 software. More specifically, the
pbmm2 (Chaisson and Tesler, 2012)
and minimap2 software (Li, 2018) were used to align the long PacBio
reads to the consensus sequences, and the alignment results were then
corrected using the Arrow and Racon methods (Walker et al., 2014).
Furthermore, clean Illumina reads were compared with the abovementioned
long PacBio read-based polished genome sequences using the BWA software
(version 0.7.10-r789; Li and Durbin, 2009) and then corrected using the
Plion software (Walker et al., 2014). Finally, the de-redundancy of the
corrected C. japonica genome was performed according to the depth
distribution and sequence similarity of the reads. The filtered Hi-C
reads were mapped to the polished C. japonica genome to detect
the positional and directional errors in contigs during 3D DNA assembly
(Dudchenko et al., 2017). The Juicerbox software (Durand et al., 2016)
was used to modify the order and directions of some contigs and to help
in the determination of chromosome boundaries. Genomic overlap was
identified based on sequence homology and long-distance interaction
patterns, and the chromosome-level C. japonica genome was
obtained.
Three methods were performed to evaluate the assembly effect
of
the C. japonica genome. First, the genome sequence was
interrupted with a step length of 1000 bp, and the interrupted sequences
were compared with the nucleotide sequence (NT) database by the Basic
Local Alignment Search Tool (BLAST) software to evaluate the accuracy of
the genome sequences. Second, the BWA
(version 0.7.10-r789; Li and Durbin,
2009) and minimap2 software (Li, 2018) were used to compare the short
Illumina reads and long PacBio reads with the genome sequence,
respectively. The consistency of sequencing reads and genome sequence
was evaluated according to comparison rate. Additionally, the
completeness of conserved C. japonica genes was evaluated using
the Benchmarking Universal Single-copy Orthologs (BUSCO, version 2.0;
Simao et al., 2015) and based on the orthologous gene database of
arthropods. Meanwhile, RNA-seq reads were compared with the genome using
the hisat2 software (Vaser et al., 2017) to assess genomic integrity.