2.1 Sampling and sequencing
Genomic DNA was extracted using a QIAamp DNA purification kit (Qiagen, Germany) according to the manufacturer’s instruction. The integrity and quality of the extracted DNA was evaluated using 1% gel electrophoresis. The DNA concentration was assessed using a Pultton DNA/Protein Analyzer (Plextech, USA). DNA with a total amount ≥ 20 μg, 1.8 < OD260/280 < 2.0 and a concentration> 12.5 ng/μl were used to construct the sequencing libraries.
A 10× Genomics linked-read library was constructed using the standard protocol (10× Genomics, San Francisco, USA). Raw reads were produced using BGISEQ-500 platform (BGI, Shenzhen, China), with read lengths of 2×100 bp. The raw reads were then filtered with FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and QC-Chain (Zhou et al., 2013). The duplicated reads, the adaptor-contaminated reads and the reads having a quality value lower than 20 (representing 1% error rate) were filtered.
To obtain a chromosome-scale genome assembly, we constructed Hi-C library for sequencing. Genomic DNA in blood samples was fixed with formaldehyde in a concentration of 1% and the fixation was terminated using 0.2 M glycine. A Hi-C library was prepared following the Hi-C library protocol (Gong et al., 2018) and then sequenced using a BGISEQ-500 sequencing platform (BGI, Shenzhen, China).
For long-read sequencing, we constructed a SMRTbell library with a fragment size of 20 Kb using the SMRTBell template preparation kit 1.0 (PacBio, USA) following the manufacturer’s protocol. The library was sequenced with a PacBio Sequel system, and data from one SMAT cell were generated.