2.1 Sampling and sequencing
Genomic DNA was extracted using a QIAamp DNA purification kit (Qiagen,
Germany) according to the manufacturer’s instruction. The integrity and
quality of the extracted DNA was evaluated using 1% gel
electrophoresis. The DNA concentration was assessed using a Pultton
DNA/Protein Analyzer (Plextech, USA). DNA with a total amount ≥ 20 μg,
1.8 < OD260/280 < 2.0 and a
concentration> 12.5 ng/μl were used to construct the
sequencing libraries.
A 10× Genomics linked-read library was constructed using the standard
protocol (10× Genomics, San Francisco, USA). Raw reads were produced
using BGISEQ-500 platform (BGI, Shenzhen, China), with read lengths of
2×100 bp. The raw reads were then filtered with FastQC
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc) and QC-Chain
(Zhou et al., 2013). The duplicated reads, the adaptor-contaminated
reads and the reads having a quality value lower than 20 (representing
1% error rate) were filtered.
To obtain a chromosome-scale genome assembly, we constructed Hi-C
library for sequencing. Genomic DNA in blood samples was fixed with
formaldehyde in a concentration of 1% and the fixation was terminated
using 0.2 M glycine. A Hi-C library was prepared following the Hi-C
library protocol (Gong et al., 2018) and then sequenced using a
BGISEQ-500 sequencing platform (BGI, Shenzhen, China).
For long-read sequencing, we constructed a SMRTbell library with a
fragment size of 20 Kb using the SMRTBell template preparation kit 1.0
(PacBio, USA) following the manufacturer’s protocol. The library was
sequenced with a PacBio Sequel system, and data from one SMAT cell were
generated.