Importantly, the success of the approach for generation of whole
mitogenomes is predicated on the availability of high-quality material
and proper handling, preserving the integrity and quality of high
molecular weight DNA. In this study, we used a modified
phenol-chloroform method (Ana Ramón-Laca et al., 2021), directly to
crude lysate or after mitochondria isolation, that does not require
pipetting steps or manipulation of the DNA to avoid DNA fragmentation.
Preventing DNA shearing is crucial for nuclear copies of mitochondrial
DNA (NumtS) avoidance and gene order preservation as well as to hamper
the linearization of the circular mitochondrial genomes, which would
decrease the amount of mtDNA at the nDNA depletion enrichment process.
High target DNA (i.e. mitochondrial DNA) concentrations are required to
achieve the deep read coverage necessary to overcome the error rates of
Oxford Nanopore sequencing. We achieved high DNA concentrations while
maintaining good DNA quality with phenol-chloroform extractions with a
phase-lock in this study (Table2). However, good DNA handling is not the
only parameter that determines the DNA integrity. The level of integrity
and the wide range of sample preservation showed in this study
demonstrated the speed at which the DNA degrades. We encourage
researchers to either preserve samples in dry ice or at -80°C as soon as
possible and to perform the extractions quickly after collection or to
preserve the DNA from the first minute in Longmire buffer at room
temperature; however, in this case only the targeted mitosequencing
approach is possible because the cells and organelles are already lysed
and the DNA is in suspension.
The results of the enrichment process (proportion of sequences on
target, read depth, fragments length, method to be used) are heavily
dependent on the type and level of degradation of the tissue and thus on
the quality and quantity of the DNA and enrichment treatment. The genome
skimming results demonstrated that the metric of proportion of reads on
target does not guarantee a reliable mitogenome, but the coverage and
the length of the fragments is what will determine a good result. The
amount of mitochondrial DNA found in skeletal muscle
(~0.3% of gDNA) already shows some level of enrichment
when compared to the expected 0.1% of the total DNA (Robin & Wong,
1988). The purification of DNA from isolated mitochondria already
enriched the proportion of target DNA by two in the case of heart and
26-fold in the case of skeletal muscle. The effect caused by the nDNA
depletion in this study, was not measured but empirically observed by a
decrease in the amount of total DNA of the sample (results not shown).
When comparing the same chinook sample using the two enrichment methods,
the targeted mitosequencing approach more than tripled the proportion of
reads on target escalating by 67 to 196-fold (Table 2).
To date, the average depth of reads for the mitochondrial genome using
PCR-free genome skimming approach with long reads is low [e.g. 9x for
the Brazilian buffy‐tufted‐ear marmoset Callithrix aurita , 78x
for fresh liver of a rodent of the genus Melanomys (Franco-Sierra
& Díaz-Nieto, 2020; Malukiewicz et al., 2021)]. Using Nanopore
sequencing, the present study achieved only 0.08 % sequences on target
and 8.9× coverage (with some areas not covered) for a genome skimming
analysis for Merluccius productus. Margaryan et al. (2021)
obtained an average value of the median depth of coverage for the 192
assemblies of Danish vertebrate species of 1,170.8×, ranging ca. 27–
12,208×, while the fraction of mtDNA reads was around 0.54% ranging
from ca. 0.005% to 5.62% depending on the tissue preservation
conditions using short-fragment data from genome skimming with HiSeq
technology (Illumina). The overall coverage of the contig was
dramatically enhanced in this study from hundreds to thousands of reads
depending on the tissue, extraction method and treatment (Table 3), with
the targeted mitosequencing method providing deepest coverage. The deep
coverage results obtained for targeted mitosequencing are, not
surprisingly, remarkably higher than other Cas9 targeted assays on
nuclear regions of interest. Gilpatrick et al. (2020) obtained a
coverage of 80x for a nuclear gene with similar length as the
mitogenomes (18kbp), but 680x when targeting the region with multiple
cuts. Since the number of available mitochondrial copies is up to 500
times the number of nuclear copies the results are comparable to the
latter mentioned study. The cutting sites for the targeted
mitosequencing approach are notably less covered than the rest of the
mitogenome (File S3.3), but these sites are precisely the least variable
regions of the mitogenome among fish. Directional bias is created at the
cut site, due to the retention of bound Cas9 ribonucleotide complexes on
cleaved DNA fragments that are distal to the PAM site.
While Sanger and mass parallel sequencing of short reads can hinder and
obscure challenging tandem repeat regions found in the non-coding
regions of the mtDNA of some organisms, such as the tandem repeat
insertion in the control region of eulachon or the 44 bp stretch of
polyG found on S. leucopsarus , in this study (Filipović et al.,
2021; Kinkar et al., 2020), long read sequencing can overcome this
obstacle. We expect a few scattered errors in the homopolymeric regions
in the mitogenomes generated herein (sequencing error rates were ≥2% at
the time of this study with Guppy5sup basecalling and R9.4 flow cells),
but these are to be fixed as the base calling algorithms evolve or
ultimately with the new sequencing kits and flowcells that increase
yield and accuracy (SQK-LSK114 and flow cell R10.4) that were just
released at the time of finalizing this manuscript, which were shown to
reach >99% accuracy. These improvements will presumably
allow single nucleotide variants and possible heteroplasmy detection
(Keraite et al., 2022). On the other hand, the genomes generated in this
study are likely to be more accurate than previous efforts since the
methods overcomes the difficulty of sequencing the tandem repeats and
long insertions, repetitions and rearrangements that can take place in
vertebrate genomes (Formenti et al., 2021), provide read depth, favor
long fragments of DNA and augment the proportion of the target. To
circumvent sequencing errors, we recommend read depth of 50-75x for
reliable mitogenome generation at the current error rate to get accurate
mitogenomes with R9.4 flow cells. Srivathsan et al. (2021) demonstrate
that 25-50x for R10.3 flowcells is sufficient and this can only decrease
with new improvements.
The mitoenrichment method can be used straight away from mitochondrial
isolation material of any fish and is particularly useful in the absence
of a reference mitogenome of the species or closely related species of
interest to develop guides for the Cas9 for the targeted mitosequencing
method. However, the mitochondrial isolation can decrease the
mitochondrial DNA amount and there can be nDNA co-purification to some
extent. Exonuclease V digestion helps reduce the noise caused by the
nDNA, but it is yet incomplete. The rapid sequencing kit (ONT) uses a
transposase and a rapid adapter ligation that makes the library
preparation brief and straightforward. We speculate that this method
will also work on other phyla provided there is enough mitochondrial
representation of the target vs microorganisms (i.e. little
contamination from symbionts or gut microbiota) as seen in preliminary
work performed on bivalves (results not shown, Ramón-Laca personal
observation). On the contrary, the Cas9 guides RNAs developed for the
targeted mitosequencing method will be useful for most fish and
particularly the guides that cut the 12S and 16SrRNA genes. The two
guides located on the tRNA-Gly may be less universal since this region
is not as conserved and there may be mismatches. Both methods take the
same time, around 2.5 h for the library preparation from DNA treatment
after the DNA extraction to library load on the flow cell, but the
mitoenrichment includes an extra step prior to the DNA extraction to
isolate the mitochondria.
The targeted mitosequencing approach tends to shorten the life of the
regular flow cells and particularly so in the case of the flongle
compared to the mitoenrichment method, presumably due to the presence of
the dephosphorylated DNA that may be blocking pores. On the other hand,
the mitoenrichment method worked well on a flongle making it an ideal
approach for de novo sequencing that will reduce the cost per samples if
running them individually almost 10 times compared to the prince of a
regular flow cell. However, a refrigerated centrifuge is still needed
for the mitochondria isolation and it comes with the disadvantage of the
flongle flow cells having a very short shelf-life, which reduces the
chances of improvisation, especially in remote places. All the analyses
of this study were carried out on a regular computer using standard
analysis software within Geneious Primer 2021, with the exception of the
Guppy5sup base calling that was performed using Google Colab with a
subscription of 10 USD a month, which allowed the use of a high-end GPU
that completed basecalling at a rate >106bp s-1.
At the time of these experiments a barcoding kit or protocol for
multiple samples was not available and it was more practical to carry
out each mitogenome sequencing in a different run to guarantee good
results and reuse the flow cell up to five times (116 USD per
mitogenome, sequencing cost without labor and reagents) or use the
flongle version of the flow cell for the mitoenrichment workflow (67 USD
per mitogenome without labor and reagents). Advancements that make
multiplexed runs using barcodes possible with the same yield will reduce
the cost significantly while increasing the throughput. (Keraite et al.,
2022) run multiple samples by using different guide RNAs for each sample
of human origin. However, this is less attainable with de novo
sequencing of mitogenomes of fishes given their diversity but could be
useful for multiple samples of the same species. Comparatively, the
widespread genome skimming approach on a NovaSeq platform (Illumina) can
run 200 samples with a cost of sequencing per sample of 140 USD that
could be decreased to USD 28 if pooling 1000 samples and aiming for 100x
on target read depth (Margaryan et al., 2021). However, this approach is
only worth if pooling many different samples, which entails time and a
great effort that is likely to involve the coordination of many
different institutions and archived samples that may see the sample
quality compromised. In addition, intense computational power would be
necessary for genome skimming, especially if aiming for deep read
approaches to gain coverage that can produce 5 Gb per specimen
(Margaryan et al., 2021), which may be impractical for many researchers
and laboratories, but also may take weeks to process while the
approaches shown here can take two days from DNA extraction to
mitogenome annotation. Moreover, typical sequencing lengths of
short-fragment platforms from range 150-300 bp and thus this approach is
expected to be more prone to include nuclear copies of the mitochondria
(Numts) in the contig; resulting in illegitimate or inaccurate
sequences. However, we have proven here that in the event of the DNA
being degraded (e.g. eulachon Table 3), genome skimming is the only
alternative in which case the researcher will have to assume the
potential contamination with Numts that may be overcome by a high
proportion of mtDNA that is likely to be due to degradation of linear
DNA. This is probably a better approach for archived samples from
museums.
Besides supplementing public databases with the standard metabarcoding
mitochondrial genes (12S, 16S, COI, and to a lesser extent Cytb or ND2),
there are other genes (e.g. D-loop) that may be relevant to closely
related species, such as those with more recent evolutionary histories.
In addition, the complete mitogenome availability will enhance multiple
marker metabarcoding efforts (Leite et al., 2021), providing the
resources for population mitogenomics (e.g. hypervariable non-coding
region found in this study among three different individuals of Pacific
hake) and will provide insightful knowledge about radiation and
evolutionary processes with possible gene rearrangement. With deeper
sequencing and research, it may even help gain perspective about
mitochondrial disorders and diseases of wildlife.
The targeted mitosequencing approach can be further optimized by
introducing some of the features used by (Keraite et al., 2022). They
pre-enriched by selecting the circular DNA using exonuclease V like we
did here for the mitoenrichment approach with a 27-49 % of reads
increase (useful for high integrity DNA only) and they also performed a
digestion with proteinase K after the cleavage of the DNA to avoid the
directional bias seen in our mitosequencing workflow from bound Cas9
enzymes to the DNA that increase the yield of full length reads 2-fold.
Further targeted enrichment could possibly be attained at the sequencing
step without any further lab-bench effort by using the adaptive sampling
capability of the nanopores (Martin et al., 2021; Payne et al., 2021).
This process requires reference sequence of the target or related
species. We speculate that if all fish mitogenomes are uploaded as
references to enrich for, the on-target yield could be improved.
However, this tool at the sequencing level requires greater
computational power and would need to be validated to ensure it does not
bias the results. Alternative methods to enrich the mitochondrial DNA
yield could be also pursued using custom myBaits Mito Targeted
Sequencing Kit (Arbor biosciences). However, this is a less universal,
more complex and longer procedure (Zascavage et al., 2019). Instead,
size selection could be achieved with gel extraction (e.g. BluePippin
platform), but this instrument was not available for this study and it
was not tested.
A great potential advantage of the MinION sequencing platform and
associated single flow cell Oxford Nanopore sequencers is the
portability outside of a traditional laboratory. The MinION has great
portability and the instruments and equipment necessary (e.g. laptop,
pipettes, minicentrifuge, small thermal cycler, vortex, plasticware)
could also be easily transported in a couple of suitcases. In fact, the
experiments in this study were mostly undertaken using a portable setup
in a home office during the Covid19 pandemic showing that the lab
requirements can be kept to a minimum. Though some of the reagents and
the flow cells listed in the methods section would be more limiting as
they need to be stored either in the fridge or freezer, these are often
readily accessible on research vessels and at remote field stations, or
portable refrigeration could be used. In the field, the targeted
mitosequencing approach could be modified to be performed in situ,
replacing the phenol-chloroform DNA purification with a simpler HMW DNA
extraction (e.g. Monarch HMW kit, New England Biolabs), with very little
equipment. The mitoenrichment approach uses the Rapid sequencing kit
SQK-RAD004 (ONT). The field sequencing kit (SQK-LRK001) is fundamentally
the same, but the chemistry is dehydrated to allow its use in the field
in the absence of a freezer. This sequencing kit might simplify and
allow in situ sequencing; however, further investigations are necessary
to determine the yield as it was not tested in this study. The
portability of this approach affords the opportunity not only to take
these resources to the field, but in doing so, takes advantages of the
freshness of samples to yield optimal results, and the production of
data in a timely manner.
In summary, the targeted mitosequencing and mitoenrichment approaches,
paired with the portable MinION sequencer, gives rapid, cost-effective
results for the generation of whole mitogenomes, which are increasingly
important for explorations of biological diversity in environmental DNA
studies. Mitogenomes can be generated ad-hoc on one to a few samples at
a time with little computational effort. Future improvements on samples
multiplexing or in sequencing devices will enable high-throughput
sequencing of mitogenomes at once. The methods outlined here rely
heavily on high quality, (HMW DNA). We strongly recommend the targeted
mitosequencing using the gRNAs designed in this study for all bony
fishes, whereas the mitoenrichment would be preferred on distant taxa
for which a group-specific nCATS has not been developed. Genome skimming
is recommended when only degraded DNA is available (e.g. Museum
specimens in ethanol). We believe these approaches will make the
generation of reference mitogenomes accessible to many researchers
worldwide.