Structural variations (SVs) identification and distribution
The SVs were identified using two complementary methods: “reads
mapping” and “assembly comparison”. Specifically, for the
“reads-based” method, we used two software programs,
NanoSV
[60] and Vulcan [61], to identify signals of SVs. NanoSV takes
advantage of split- and gapped-aligned reads to define
breakpoint-junctions of SVs, following the mapping of long reads to
genome references (Mmul_10)
with
LAST v1256 [114] and the alignment processing with Sambamba V0.8.2
[115]. Vulcan integrates several pipelines, including the dual-mode
alignment of long reads
with
aligners minimap2 [109] and
NGMLR
v0.2.7 [116] and the SVs calling with
Sniffles2
[116]. The “assembly-based” method was based on SyRI v1.6
[53]. We compared the results of SVs with BEDTools v2.30 [117].
The consensus SVs with shared regions (covering mutually at least 80%
of SVs lengths) were identified as the lower bound of a reliable call
set. To reveal potentially consistent patterns of SVs from different
algorithms, the SVs from these methods were compared and defined as the
upper bound of a reliable call set.