Excess number of SVs was found in X chromosome
Genomic SVs between the Chinese and Indian subspecies was identified jointly with the read-based and assembly-based strategies [59] of three methods, NanoSV [60] and Vulcan [61], and SyRI [53]. The two methods based on long-read data could counteract the effects of assembly errors, thereby reducing the bias caused by only the assembly-based strategy. In addition, since the samples from the Chinese and Indian subspecies (CR2 vs. mm10) were compared, the SVs results could largely represent the subspecies-level divergence.
Specifically, Vulcan (41006) showed a more sensitive calling of SVs than both NanoSV (8786) and SyRI (14492). Interestingly, Vulcan and SyRI identified more SVs than Vulcan and NanoSV did, although Vulcan and NanoSV are reads-based methods while SyRI is an assembly-based method. Using a rigorous criterion (>80% of reciprocal overlapping region for each SV), 2248 congruent SVs with were found among all three methods (Figure 1C and Supplementary table 2). It is worth noting that some previous studies preferred to use the shared SVs obtained from different algorithms as the bona fide call-set of SVs [62]. For the specific purposes of this study, we used a reconciled method by treating the congruent SVs as the lower bound while the independent SVs from three methods as the upper bound of the bona fide call-set.
Based on the congruent 2248 SVs, we analyzed the chromosomal distribution preference of SVs. The percentage of X-chromosomal SVs is not the highest (4.54%) relative to some autosomes (7.87% for chromosome 1, 7.12% for chromosome 19, 5.83% for chromosome 11, etc.). However, after adjusting the effective population size ratio between X and autosomes (3/4) and considering the physical length of chromosomes, we found an excess of X-chromosome SVs relative to autosomal SVs (0.89/Mb vs. 0.80/Mb). When focusing on the long SVs over 1000bp, we found that X chromosome was the most extreme outlier of the linear pattern that shapes autosomes (Figure 1D, p < 0.001). This pattern indicates the “faster-X divergence” in principle [63], which is consistent with a previous observation in other mammalian species based on segmental duplication identified using only SyRI [27].