The genome-wide methylation and structural variations (SVs)
The Nanopore sequencing can capture the genome-wide signals of
methylation [64], which provides us a unique chance to understand
the epigenetic pattern at a more diverse scale for genomic elements. The
genome-wide methylation frequencies showed a similar distribution
pattern across autosomes while a slightly flatter distribution on X and
Y chromosome, suggesting different methylation patterns between
homozygotes and hemizygotes (Supplementary figure 2 and Figure 2a).
Median methylation values were found to be significantly different
between autosomes (0.812), X (0.800), Y (0.778), and mitochondrial
genome (0.032) (Figure 3a, Wilcoxon test, p < 0.001). The
distribution shape of methylation frequencies in mitochondrial genome
revealed a single small value peak (0.015), which is in sharply contrast
to the peaks of larger values for nuclear chromosomes
(>0.8, Figure 2B). Thus, our nanopore data supported the
low methylation level in mitochondrial genome.
We comprehensively analyzed the methylation pattern for variants
including SNPs, small indels (<50bp), and SVs
(>50bp). We called variants of SNPs and small
Insertion–deletion mutations (indels, <50bp) using the
short-reads data of population genomes including 27 samples covering
both Indian and Chinese subspecies. For SNPs, among variants annotated
to affect different gene structures, we uncovered the lowest methylation
levels for variants in 5’ UTR of genes (Figure 3A), suggesting a similar
pattern with the previous finding that the methylation frequencies are
the lowest around the transcription start sites (TSS) [65]. In
addition, the SNPs predicted with high impact (e.g., transcript
ablation, frameshift, etc.) showed a significant lower methylation
frequency than those SNPs predicted with low (e.g., synonymous variants,
etc.), modifier (e.g., 3’UTR region, 5’UTR region, etc.), and moderate
(e.g., missense, 3’UTR deletion, etc.) impacts (Wilcoxon rank sum test,
p = 2.31e-5, 5.80e-12, and 0.016, for low, modifier, and moderate impact
SNPs, respectively) (Figure 3B).
For SVs called from three methods with exclusively long reads (NanoSV,
Vulcan, and SyRI), we found similar patterns in methylation frequencies
for different types of SVs (Figure 3C). Based on the distribution of
median methylation frequencies for all four SVs types (deletions, 0.89;
insertions, 0.83; inversions, 0.79; and duplications, 065), deletions
and duplications were found to be the highest and the lowest methylated,
respectively (Wilcoxon rank sum test, p < 0.05 for all
pair-wise comparisons). For small indels, lengths of both deletions and
insertions demonstrated significant positive correlations with
methylation levels (Figure 3D), with deletions showing a higher positive
correlation than insertions (0.81 vs. 0.79). In addition, methylation
frequencies of small indels showed a significant lower median than those
of SVs (0.80 vs. 0.83, Wilcoxon rank sum test, p = 0.016). These
patterns suggested that variants impacting longer DNA segments may have
higher levels of methylation.