Excess number of SVs was found in X chromosome
Genomic SVs between the Chinese and Indian subspecies was identified
jointly with the read-based and assembly-based strategies [59] of
three methods, NanoSV [60] and Vulcan [61], and SyRI [53].
The two methods based on long-read data could counteract the effects of
assembly errors, thereby reducing the bias caused by only the
assembly-based strategy. In addition, since the samples from the Chinese
and Indian subspecies (CR2 vs. mm10) were compared, the SVs results
could largely represent the subspecies-level divergence.
Specifically, Vulcan (41006) showed a more sensitive calling of SVs than
both NanoSV (8786) and SyRI (14492). Interestingly, Vulcan and SyRI
identified more SVs than Vulcan and NanoSV did, although Vulcan and
NanoSV are reads-based methods while SyRI is an assembly-based method.
Using a rigorous criterion (>80% of reciprocal overlapping
region for each SV), 2248 congruent SVs with were found among all three
methods (Figure 1C and Supplementary table 2). It is worth noting that
some previous studies preferred to use the shared SVs obtained from
different algorithms as the bona fide call-set of SVs [62]. For the
specific purposes of this study, we used a reconciled method by treating
the congruent SVs as the lower bound while the independent SVs from
three methods as the upper bound of the bona fide call-set.
Based on the congruent 2248 SVs, we analyzed the chromosomal
distribution preference of SVs. The percentage of X-chromosomal SVs is
not the highest (4.54%) relative to some autosomes (7.87% for
chromosome 1, 7.12% for chromosome 19, 5.83% for chromosome 11, etc.).
However, after adjusting the effective population size ratio between X
and autosomes (3/4) and considering the physical length of chromosomes,
we found an excess of X-chromosome SVs relative to autosomal SVs
(0.89/Mb vs. 0.80/Mb). When focusing on the long SVs over 1000bp, we
found that X chromosome was the most extreme outlier of the linear
pattern that shapes autosomes (Figure 1D, p < 0.001). This
pattern indicates the “faster-X divergence” in principle [63],
which is consistent with a previous observation in other mammalian
species based on segmental duplication identified using only SyRI
[27].