Results and Discussion

Genome Quality Metrics

The H. glycines genome assembly comprises 2,109 contigs, all of which were incorporated into the expected 9 pseudomolecule scaffolds using the Juicer pipeline, in agreement with cytological observations [41, 94]. The genome size of the new assembly is 157,982,452 bp, within the expected range for this clade of species (Table 1). However, this increased genome size may be due to the incorporation of repetitive haplotigs, an assumption supported by the inflation of repeats (61.4 Mb) compared to the previous TN10 H. glycines draft (42.1 Mb). Still, total repeat content (38.9%) is within the published range (34-47.7%) [40, 41], and yet maintains a lower repeat content than the X12 assembly (67.3 Mb) (Supplementary Table 1).
To assess quality and completeness, the input sequences were aligned to the assembly. High rates of alignment: 97.3%, 97.2%, and 73.5%, were observed for Pacbio subreads, Pacbio CCS reads, and 260bp PE Illumina reads, respectively. To evaluate the genic complement of the annotation, we ran BUSCO3 (Benchmarking Universal Single Copy Orthologs) [86] on both the genome and its predicted proteins. Of the 982 possible Nematoda BUSCO genes, 634 (64.6%) and 674 (68.6%) were complete, 46 (4.7%) and 122 (12.4%) were duplicated, and 86 (8.8%) and 88 (9%) were fragmented, respectively. A stringent BLAST of missing BUSCO proteins on the predicted proteins found 141 of the missing BUSCO proteins (median e-value of 5.8e-18), achieving a possible complete rate of 83% [95]. Overall, the high proportion of input read mapping, high BUSCO scores, and complete incorporation of all contigs, suggests this latestH. glycines assembly is of high quality.

Improvements over existing soybean cyst nematode assemblies

This pseudomolecule assembly is a massive step forward in the genomics of plant-parasitic nematodes, increasing the ability of interspecies comparisons. To assess the contiguity and accuracy of our new assembly we used gene-based synteny with BLAST and i-ADHoRe [93] as well as direct chromosome alignments using Mummer3 (Figure 1). With the gene-based approach, 67Mb and 31.7Mb of synteny was found to the TN10 draft and the X12 genome assemblies, respectively. Using Mummer, these assessments rose to 92.4Mb and 43.4Mb, respectively. Considering that more than 61Mb of repeats are in the genome, synteny to 42-58% and 20-27% of the genome in the TN10 draft and X12 assemblies, respectively, is high.
Assignment of contigs to chromosomes was improved in this assembly compared to existing X12 assembly. These differences resulted in the identification of a number of large chromosomal misjoins in the X12 assemby: including multiple interchromosomal translocations and the misassignment of chromosome 9 (Figure 1; Supplemental Table 2). Surprisingly, after adjusting for these large chromosomal misjoins in X12, there were very few chromosomal rearrangements between the two lines of a highly adaptable species (Figure 1).

Gene Annotation

The gene annotation resulted in 22,465 gene models, encoding 23,933 transcripts with an average gene length of 4,569bp, values that are comparable to related species (Table 2). While the frequency of genes is substantially larger than the previously published X12 annotation (11,882), the propensity for parasites to duplicate genes involved in host-parasite interactions requires a novel approach to gene prediction. To prevent the obliteration of parasitism genes thought to be maintained at high copies in the nematode, we developed an annotation approach was taken to predict all transcribed elements in the genome, including repetitive elements. A genome without repeat masking was used to allow highly similar, high-copy number genes to be identified. However, because repetitive elements frequently reside in noncoding regions of genes, multiple genome-guided transcriptomes and gene predictions enabled the dissection of high-confidence gene models (Supplemental Table 3). This improvement in gene prediction is indicated by our total gene count (22,265). Our analyses included known parasitism genes and repeats missing from X12 (11,882) and produced a more highly contiguous genome than the previous TN10 assembly (29,769). Our average and median gene and transcript lengths are the largest among the compared species, while exon count per transcript has also increased relative to earlier annotations of H. glycines and other related species. Another line of evidence to support these gene predictions lies with the high proportion of genes that have functional annotations with 85.2% of predicted proteins or transcripts having homology to sequences in Interpro, Swissprot, NCBI NR, or NCBI NT databases (Supplemental Table 4; Supplemental Table 5).

Effector gene prediction

With a complete genome, we can now better understand the molecules that are exchanged between the parasite and host. The first step to identifying these molecules lies in characterizing transcripts that produce proteins with signal peptides and without transmembrane domains, which partitioned 1,514 transcripts from the 23,933 total transcripts, and which were attributed to 1,421/22,465 genes (Table 3). A second elementary step in identifying these molecules lies with attributing the previously published effectors to genes in the genome [11, 19]. Using DNA sequence similarity, 125 potential effector genes were identified with a minimum query alignment and sequence identity of 50%. Using the same parameters with protein query length and identity with Diamond, 362 effector-like protein-coding genes were identified. However, only 44/125 and 117/362 of these putative effectors encode secreted proteins, indicating that genes may be variable among SCN lines in their propensity to be secreted, a variability that may contribute to SCN virulence.

Expression of Effector genes

In the hopes of further resolving the genes important to the host-parasite exchange, we leveraged existingH. glycines RNA-seq (SRP122521). All possible comparisons were made between pre-parasitic (i.e., before root penetration) second-stage juvenile nematodes (PP), second-stage parasitic (i.e., after root penetration) nematodes on a susceptible host (C for compatible), and second-stage parasitic nematodes on a resistant host (IC for incompatible) (Supplemental Data 1). These data were integrated into a tabular database of genes, functional annotations, sequences, and differential expression. Using this database, we filtered genes in theH. glycines genome for key traits of genes involved in parasitism, including differential expression at a p-value of 0.05, >1 log2 fold change, signal peptide presence, and the absence of a transmembrane domain. Of the 1,421 genes we assessed, 61 genes were differentially expressed between the PP and C, 392 genes between PP and IC, and 609 in novel comparisons between C and IC samples (Table 3). Among these comparisons, we assessed which genes may be involved in host nucleus reprogramming by annotating NLS signals in these differentially expressed genes. We only found four and fourteen genes encoding proteins with NLS signals in the C and IC vs PP comparisons, respectively. However, comparisons between C and IC revealed 113 differentially expressed genes that were also secreted and nuclear targeted.
While effector genes upregulated in the preparasitic stages are likely to be associated with the migratory phase of parasitism, getting a list of candidate effectors for parasitic stages was less complete. Only four published effectors were upregulated in C vs PP, and three were upregulated in IC vs PP (Table 3). However, by comparing effector genes that were downregulated in parasitic IC samples but also upregulated in C samples vs PP, we discovered a number of effector genes that were downregulated when encountering resistance: 6E07[11], 4G06 (ubiquitin extension)[11], 4D06[11], three versions 45D07 type effectors (chorismate mutase) [14], 30C02 (defense suppressor) [29], four 2D01 type effectors (interacts with plant LRR)[96], 20E03[11], 12H04[11], 5A08 (RAN-binding, interacts with soybean LRRs [97]), and Gland14 (endopeptidase) [19]. While interesting, these decreases in expression could also be interpreted as the early stages of nematode death on a resistant host (IC), warranting further investigation into the mechanisms for these expression changes.