3 Results and Discussion
Genome assembly
We adopted second-generation and
third-generation sequencing for the comprehensive analysis of theR. pedestris genome. A total of 118.9 Gb clean data was generated
using the Illumina
platform.
The k-mer (k=19) analysis indicated that
theR. pedestris genome had high heterozygosity and the estimated
size was 1.04 Gb (Figure S1). Using the PacBio RS II platform, a total
of 132 Gb clean data was generated with an N50 read length of 18.1 kb
(Table S2). Based on the sequencing data above, we obtained a
contig-level assembly that contained
934 contigs and 1,211,875,501
bases. The contig N50 reached 13,968,850 bp, which is significantly
longer than the genome assemblies of other bugs (Table 1; Table S3).
Then, we used Hi-C long‐range scaffolding to improve our assembly. A
total of 105 Gb raw data was generated, and 82-Gb filtered clean reads
were used for the following analysis (Table S4). As a result, more than
95.7% of the total genome bases were successfully anchored to 6 unique
chromosomes, with the chromosomic lengths ranging from 162.5 Mb to 285.2
Mb (Figure 2A; Figure
3).
Chromosomic staining was performed to validate the Hi-C result, and we
identified 12 chromosomes (n=5A+XX) in female R. pedestris and 11
chromosomes (n=5A+X) in male R. pedestris (Figure 2B). A previous
work demonstrated that the sex chromosome system of most insects in the
family Coreidae is XO (Muramoto 1978). The concordant results between
Hi-C assembly and chromosomic staining confirmed the accuracy of our
genome assembly and indicated thatR. pedestris is XO-determinant.
There were 745 scaffolds that failed to anchor. These unlocalized
scaffolds were further blasted against the bacteria database to identify
the potential bacteria that resided within R. pedestris . A total
of 168 scaffolds showed high homology to known bacteria, and these
bacterial-origin scaffolds were used for further analysis (Table S5).
Taking these results together, we assembled the R. pedestrisgenome (containing 6 chromosomes and 577 unlocalized scaffolds) with a
final size of 1.193 Gb (GC content: 33.8%) and a scaffold N50 of 181.3
Mb (Table 1).
To assess the completeness of the genome assembly, the
assembled genome was subjected to
BUSCO analyses against insect,
arthropod, metazoan, and eukaryote datasets. The results showed that
94.7% (against the insect dataset) to 96.5% (against the arthropod
dataset) of BUSCO genes could be identified in the R. pedestrisgenome, suggesting the completeness and high quality of our genome
assembly (Figure S2). There were 2,774,084 repetitive sequences
identified, accounting for 56.0% of the R. pedestris genome
(Table S6). Additionally, we identified 1,266,644 SSRs in the genome
assembly (Table S7).
Identification of the Serratia marcescens genome
R. pedestris is well known for its intimate association withBurkholderia , which provides beneficial effects on insect
development (Takeshita and Kikuchi 2017). Our genome assembly did not
detect any Burkholderia -origin sequence, indicating the absence
of Burkholderia in this insect strain. There were 168
bacterial-origin scaffolds from Serratia , Enterobacter ,Kluyvera , and Wolbachia in our genome assembly (Table S5).Serratia was reported to confer host resistance to invading
parasitoids (Bai et al. 2019), while Wolbachia supplies essential
nutrition for host reproduction (Ju et al. 2020). Although the insect
used for genome sequencing was free of Burkholderia , we did not
observe any abnormal phenotype previously described in aposymbiotic
insects (Kikuchi et al. 2007). We presumed that additional bacteria,
other than Burkholderia , provide specific nutritional or other
contributions to R. pedestris , which is an aspect that deserves
further investigation.
Among the bacterial-origin scaffolds, the longest scaffold (sc00000)
could be successfully assembled into a circle, and we mainly focused on
this complete bacteria genome in the present study. Phylogenetic
analysis showed that the sc00000 was closely related to Serratia
marcescens Db11 (Figure 4). S. marcescens is a gram-negative
bacillus that belongs to Enterobacteriaceae . It has a wide host
range, including plants, insects and animals (Grimont and Grimont 1978).
In R. pedestris , S. marcescens was reported to colonize
the insect midgut, but its function remained unknown (Lee et al. 2017).
Our study, for the first time, assembled the complete genome of S.
marcescens from R. pedestris , and we tentatively named this
bacterial strain as S. marcescens Rip1 (hereafter: Rip1). The
complete genome of Rip1 comprises a single circular chromosome of 5.04
Mbp, with an overall G+C content of 59.4% (Figure 4). SomeSerratia strains, such as S. marcescens CAV1761 andS. marcescens AS-1, contain additional plasmids (Sakuraoka et al.
2019). However, no sequence homologous to these plasmids was found in
Rip01. Overall, the genomes of Rip1 and Db11 are highly conserved and
essentially collinear (Figure S3).
Previous work demonstrated that S. marcescenssecretes virulence factors to
counter antimicrobial proteins produced by R. pedestris , thus
escaping host immune defenses (Chen et al. 2017; Lee et al. 2017). We
identified at least 15 antibiotic-resistance genes and 32 putative
virulence factors in Rip1 (Figure 4, Table S8, Table S9). These genes
confer antibiotic resistance to a wide variety of antibiotics, including
fosfomycin, mupirocin, polymyxin, aminocoumar, beta-lactam, isoniazid,
and fluoroquinolone, and might be key in the successful colonization ofS. marcescens in the host midgut.
Genome annotation
Oxford Nanopore technology (ONT)
was used to aid the genome annotation. A total of 22 Gb clean data in
25,876,282 ONT subreads was obtained, with an N50 read length of 1,027
bp (Table S10). In total, we identified 21,562 non-redundant
protein-coding genes using the LoReAn annotation pipeline. The number of
predicted genes in R. pedestris was similar to those in G.
buenoi and O. fasciatus but much higher than that in C.
lectularius . Quality control showed that the predicted gene set was
able to recover 92.0% (against the insect dataset) to 96.7% (against
the eukaryote dataset) of BUSCO genes (Figure S4). The majority
(21,139 genes, 98.0%) of the
predicted genes were distributed within the 6 chromosomes. Among the
21,562 predicted genes, 17,909 genes
were successfully annotated by at
least one database in the NCBI Nr, Swiss-Prot, KOG, KEGG, GO, and
InterPro domain databases (Figure S5-S8; Table S11). Additionally, we
identified 10 ncRNAs, 51 rRNA, 4,103 tRNA, 221 small nuclear RNA, 80
microRNA, and 3,202 long ncRNAs in the R. pedestris genome (Table
S12).
Expression patterns of predicted genes
RNA-seq data from 6 different
tissues and 37
different development stages were
analyzed (Table S13), and a total of 21,320 genes were determined to be
transcribed (Supplementary files 1). For tissue-specific analysis, there
were 498, 21, 401, 254, 2389, and 3181 genes found to be mainly
expressed in the midgut, fat body, carcass, muscle, testis, and ovary,
respectively (Figure S9A). Enrichment analysis demonstrated that most
midgut-specific genes were involved in metabolism, exosomes, and
proteasomes (Figure S10A), while most carcass-specific genes were
involved in structural constituents of the cuticle, environmental
adaption, amino acid metabolism, odorant binding and sensory perception
(Figure S10B). The muscle-specific genes were mainly associated with the
cytoskeleton, ion channels, lipid metabolism, and protein kinases
(Figure S10C). There were abundant genes specifically expressed in
gonads. For testis-specific analysis, the enriched pathways were mainly
annotated as being associated with signaling and cellular processes,
transporters, carotenoid biosynthesis, chromosome proteins, and
cytochrome P450s (Figure S10D). For ovary-specific analysis, enriched
pathways were associated with chromosome proteins, DNA metabolism,
glycosyltransferase, and glycan metabolism (Figure S10E). Additionally,
we used LC-MS/MS to identify proteins authentically expressed in these 6
tissues (Figure S9B). A total of 2,622 proteins was successfully
identified, with a unipep≥2 (Supplementary files 2).
For the developmental analysis, a weighted gene co-expression network
analysis (WGCNA) successfully
assigned 9,887 genes into 32 modules (Figure S11). Among them, 6,684
genes (belonging to three modules) were more highly expressed in the egg
period than during other stages. There were 1,467 and 227 genes that
showed the highest expression in the male and female adult period,
respectively. Additionally, we identified 352 genes (belonging to three
modules) associated with insect molting. For example, insulin-like
growth factor-binding protein 7, replication factor C, and endocuticle
structural glycoprotein were more highly expressed before insect
molting, while N-acetylneuraminate lyase, partitioning defective 3, and
cytochrome P450 6a14 were more highly expressed during insect molting
(Supplementary files 1). These data provide a good resource for future
functional studies of R. pedestris.
Identification of the sex
chromosomes and autosomes
To identify the potential sex chromosome, Illumina reads from male and
female R. pedestris were mapped to the reference genome,
respectively. In the XO system, males and females have only an X
chromosome, and the MRPM of the X chromosome in females should be nearly
two-fold higher than that in males (Ma et al. 2020). Based on this
presumption, the shortest chromosome was determined to be the X
chromosome, with nearly double the MRPM in females compared to that in
males (1,414,872 versus 859,724). For the other 5 chromosomes, the
female/male ratio of the MRPM ranged from 0.87 to 0.97, which is
suggestive of autosomes (Figure 3, Table S14).
Gene family identification and phylogenetic tree construction
To investigate the phylogenetic
position of R. pedestris , protein data from 12 hemipteran insects
were retrieved from public database, and D. melanogaster was used
to root the tree. A total of 23,850 gene families were clustered by
OrthoMCL, and we selected 963 single copy genes with 356,827 reliable
sites to construct the phylogenetic tree. The results showed thatR. pedestris is a sister taxon to the milkweed bug O.
fasciatus , and the 2 species diverged from their common ancestor at
approximately 35 million years ago (Figure 5). Three hematophagous bugs
(R. prolixus , C. lectularius , and T. rubrofasciata )
form a sister lineage, and they clustered with three herbivorous bugs in
Heteroptera. There were 27, 216,
and 45 gene families identified as Sternorrhyncha-specific,
Auchenorrhyncha-specific, and Heteroptera-specific, respectively.
Gene-family expansion/contraction analysis demonstrated a higher
frequency of gene contraction in Heteroptera. Compared with O.
fasciatus , 817 gene families in R. pedestris underwent
expansion, while 1130 gene families in R. pedestris underwent
contraction (Figure 5).
Chromosome synteny
T. rubrofasciata (680 Mb in genome size), which is the only
chromosomal-level bug genome available in the Heteroptera suborder, was
reported to have 13 chromosomes, while R. pedestris has only 6
chromosomes. Synteny analysis demonstrated some fusion or fission events
between the two bug species (Figure 6). For example, R. pedestrisChr1 contains segments from T. rubrofasciata Chr3, Chr5, Chr6,
Chr7, and Chr11, and R. pedestris Chr2 contains segments fromT. rubrofasciata Chr1 and Chr4. The sex chromosome of R.
pedestris showed high synteny with T. rubrofasciata Chr12 and
Chr13. It is likely that Chr12 or Chr13 might be the sex chromosome ofT. rubrofasciata , which deserves further investigation.