The genome-wide methylation and structural variations (SVs)
The Nanopore sequencing can capture the genome-wide signals of methylation [64], which provides us a unique chance to understand the epigenetic pattern at a more diverse scale for genomic elements. The genome-wide methylation frequencies showed a similar distribution pattern across autosomes while a slightly flatter distribution on X and Y chromosome, suggesting different methylation patterns between homozygotes and hemizygotes (Supplementary figure 2 and Figure 2a). Median methylation values were found to be significantly different between autosomes (0.812), X (0.800), Y (0.778), and mitochondrial genome (0.032) (Figure 3a, Wilcoxon test, p < 0.001). The distribution shape of methylation frequencies in mitochondrial genome revealed a single small value peak (0.015), which is in sharply contrast to the peaks of larger values for nuclear chromosomes (>0.8, Figure 2B). Thus, our nanopore data supported the low methylation level in mitochondrial genome.
We comprehensively analyzed the methylation pattern for variants including SNPs, small indels (<50bp), and SVs (>50bp). We called variants of SNPs and small Insertion–deletion mutations (indels, <50bp) using the short-reads data of population genomes including 27 samples covering both Indian and Chinese subspecies. For SNPs, among variants annotated to affect different gene structures, we uncovered the lowest methylation levels for variants in 5’ UTR of genes (Figure 3A), suggesting a similar pattern with the previous finding that the methylation frequencies are the lowest around the transcription start sites (TSS) [65]. In addition, the SNPs predicted with high impact (e.g., transcript ablation, frameshift, etc.) showed a significant lower methylation frequency than those SNPs predicted with low (e.g., synonymous variants, etc.), modifier (e.g., 3’UTR region, 5’UTR region, etc.), and moderate (e.g., missense, 3’UTR deletion, etc.) impacts (Wilcoxon rank sum test, p = 2.31e-5, 5.80e-12, and 0.016, for low, modifier, and moderate impact SNPs, respectively) (Figure 3B).
For SVs called from three methods with exclusively long reads (NanoSV, Vulcan, and SyRI), we found similar patterns in methylation frequencies for different types of SVs (Figure 3C). Based on the distribution of median methylation frequencies for all four SVs types (deletions, 0.89; insertions, 0.83; inversions, 0.79; and duplications, 065), deletions and duplications were found to be the highest and the lowest methylated, respectively (Wilcoxon rank sum test, p < 0.05 for all pair-wise comparisons). For small indels, lengths of both deletions and insertions demonstrated significant positive correlations with methylation levels (Figure 3D), with deletions showing a higher positive correlation than insertions (0.81 vs. 0.79). In addition, methylation frequencies of small indels showed a significant lower median than those of SVs (0.80 vs. 0.83, Wilcoxon rank sum test, p = 0.016). These patterns suggested that variants impacting longer DNA segments may have higher levels of methylation.