Chromosome structural variation and GO analysis
To investigate the differences between subgenome A and subgenome D, we
performed synteny analysis between paralogs in the P. tomentosagenome. This revealed collinear in-paralogous gene pairs, and suggested
general collinearity at the sub-genome level, with dispersed collinear
blocks among homologous and nonhomologous chromosomes (Fig. 5, center).
We found 65,864 paralogous gene-pairs, 1,434 collinear blocks, and
65,444 collinear gene-pairs between the two subgenomes (Table S14). We
infer that these may have arisen from duplication events that occurred
in Populus prior to its divergence as a section ofPopulus .
To study genome-wide structural variation (SV), including copy number
variation (CNV), deletions (DEL), insertions (INS), inversions (INV),
and translocations (TRANS) among chromosome pairs (Fig. 5, rings 1-5
(referred to as circled numbers such as “①” hereafter), we conducted
alignments using MUMmer, and subsequently called them out using SVMU
(Structural Variants from MUMmer) 0.3
(https://github.com/mahulchak/svmu). The results indicated that there
were abundant chromosome structural variations in the P.
tomentosa genome. Across the whole genome we detected 15,480 structural
variations in total, of which INS (6,654) and DEL (6,231) accounted for
the majority (83%). The other variant numbers were 1,602 and 694, and
299 for INV, TRANS and CNV, respectively, which together accounted for
27% of the total number of SVs observed (Table S15). The vast majority
of INS, DEL, and CNV variations occurred between homologous chromosome
pairs, whereas TRANS were generally seen between non-homologous pairs
(Table S15, Fig. S9).
By plotting the distribution of five SV types along 38 P.
tomentosa chromosomes, we observed that a total of 299 CNVs had an
irregular and sporadic distribution across the whole genome (Fig. 5).
Relatively, high-density CNVs were seen on Chr17A and Chr17D (0.54/Mb),
Chr09A and Chr09D (0.47/Mb), whereas comparably low-density CNVs
distributed on Chr06A and Chr06D (0.13/Mb), Chr13A and Chr13D (0.15/Mb),
Chr07A and Chr07D (0.18/Mb) (Fig. 5②). We also noticed that most of DELs
were almost evenly distributed through the whole genome, showing a
slight preference for the telomere regions of Chr12A, Chr12D, Chr17A,
Chr17D, Chr18A and Chr18D (Figure 5③). Similarly, INSs were present at
high-density and showed a slight preference for telomere regions of
Chr07A, Chr07D, Chr15A, Chr15D, Chr18A and Chr18D (Figure 5④). In
contrast, INVs had a more uneven distribution across the genome (Figure
5⑤). INVs were more abundant on Chr01A and Chr01D, whereas their
distribution was limited on other chromosomes. TRANS were very sparsely
distributed on chromosomes, with only a few detected on Chr02D, Chr07D,
Chr08D, Chr13D and Chr14D (Figure 5⑥).
We performed GO enrichment analysis for the genes located in the total
15,480 SVs region using the Plant GoSlim database, and detected 23 GO
categories significantly over-represented with respect to the whole set
of genes (Fig. 6). Ten of them (“motor activity,” “transporter
activity,” “DNA binding,” “transport,” “metabolic process,”
“lysosome,” “nuclear envelope,” “peroxisome,” “cell wall” and
“extracellular region”) were over-represented in genes affected by
INS, three (“chromatin binding,” “translation” and “ribosome”)
were over- represented in genes affected by CNV, three (“hydrolase
activity,” “response to biotic stimulus” and “lipid metabolic
process”) were over-represented also in genes affected by both INS and
TRANS, two (“cell differentiation” and “growth”) were
over-represented also in genes affected by INV, two (“vacuole” and
“circadian rhythm”) were over-represented also in genes affected by
TRANS, one (“endosome”) was over-represented also in genes affected by
both DEL and CNV, one (“carbohydrate binding”) was over-represented
also in genes affected by DEL, CNV and TRANS, and one (“plasma
membrane”) was over-represented also in genes affected by both CNV and
TRANS. Overall, functional annotation showed enrichments associated with
all of the major GO categories (Fig. 6a).
To explore the biological importance of the SVs, we further annotated
genes which were highly enriched in above GO categories. We found that
many genes with CNV, INS and DEL regions are involved in
disease-resistance and sugar metabolism pathways (Fig. 6b). For
examples, Potom05G0191000 and Potom05G0207500 with CNV, Potom06G0303900
and Potom01G0355800 genes with DEL, all of which encode LRR
receptor-like serine/threonine-protein kinase FLS2, which may be
important for disease resistance. The disease-resistant genes in INS
region are mainly annotated as nitro oxide synthase, enhanced disease
susceptibility 1 protein and pathogenesis related protein 1, which are
involved in plant hormone signal transduction and plant-pathogen
interaction. More interestingly, we found 3 copies of both
Potom05G0191000 and Potom05G0207500 in subgenome P. adenopoda ,
and 11 copies of both Potom05G0191000 and Potom05G0207500 in subgenomeP. alba var. pyramidalis . Previous studies in,Glycine max (McHale et al., 2012)
also indicated that structural variations such as CNV are common in
genes related to disease resistance and biological stress. More copy
numbers of both Potom05G0191000 and Potom05G0207500 may help explain why
the elite individual LM50 shows strong disease resistance—a trait that
is known for among forest growers. Of course, this hypothesis needs
functional validation.
We also found many genes involved in carbohydrate metabolism had
structural variations including CNV, DEL and INS. They were , for
example, as UDP-glucuronate 4-epimerase,
alpha-1,4-galacturonosyltransferase, and beta-galactosidase (Fig. 6b).
In addition, Potom03G0262900 and Potom01G0217800 that showed INS
variation were annotated as ADP sugar diphosphatase and pectinesterase,
and involved ribose phosphorylation and pentose and glucuronate
interconversions, respectively; they may be important for energy and
growth. Finally, it well known that the existence of centromere and
telomere plays an important role in maintaining chromosome stability.
Interestingly, we also found that the three genes Potom01G0282700,
Potom12G0168500 and Potom12G0040500 showed INS variation, and are
involved in meiotic DNA break processing and repairing, chromatin
silencing at rDNA, and histone methylation. These genes may play a role
in maintaining chromosome structure or reducing the rate of meiotic
recombination that we observed.