Importantly, the success of the approach for generation of whole mitogenomes is predicated on the availability of high-quality material and proper handling, preserving the integrity and quality of high molecular weight DNA. In this study, we used a modified phenol-chloroform method (Ana Ramón-Laca et al., 2021), directly to crude lysate or after mitochondria isolation, that does not require pipetting steps or manipulation of the DNA to avoid DNA fragmentation. Preventing DNA shearing is crucial for nuclear copies of mitochondrial DNA (NumtS) avoidance and gene order preservation as well as to hamper the linearization of the circular mitochondrial genomes, which would decrease the amount of mtDNA at the nDNA depletion enrichment process. High target DNA (i.e. mitochondrial DNA) concentrations are required to achieve the deep read coverage necessary to overcome the error rates of Oxford Nanopore sequencing. We achieved high DNA concentrations while maintaining good DNA quality with phenol-chloroform extractions with a phase-lock in this study (Table2). However, good DNA handling is not the only parameter that determines the DNA integrity. The level of integrity and the wide range of sample preservation showed in this study demonstrated the speed at which the DNA degrades. We encourage researchers to either preserve samples in dry ice or at -80°C as soon as possible and to perform the extractions quickly after collection or to preserve the DNA from the first minute in Longmire buffer at room temperature; however, in this case only the targeted mitosequencing approach is possible because the cells and organelles are already lysed and the DNA is in suspension.
The results of the enrichment process (proportion of sequences on target, read depth, fragments length, method to be used) are heavily dependent on the type and level of degradation of the tissue and thus on the quality and quantity of the DNA and enrichment treatment. The genome skimming results demonstrated that the metric of proportion of reads on target does not guarantee a reliable mitogenome, but the coverage and the length of the fragments is what will determine a good result. The amount of mitochondrial DNA found in skeletal muscle (~0.3% of gDNA) already shows some level of enrichment when compared to the expected 0.1% of the total DNA (Robin & Wong, 1988). The purification of DNA from isolated mitochondria already enriched the proportion of target DNA by two in the case of heart and 26-fold in the case of skeletal muscle. The effect caused by the nDNA depletion in this study, was not measured but empirically observed by a decrease in the amount of total DNA of the sample (results not shown). When comparing the same chinook sample using the two enrichment methods, the targeted mitosequencing approach more than tripled the proportion of reads on target escalating by 67 to 196-fold (Table 2).
To date, the average depth of reads for the mitochondrial genome using PCR-free genome skimming approach with long reads is low [e.g. 9x for the Brazilian buffy‐tufted‐ear marmoset Callithrix aurita , 78x for fresh liver of a rodent of the genus Melanomys (Franco-Sierra & Díaz-Nieto, 2020; Malukiewicz et al., 2021)]. Using Nanopore sequencing, the present study achieved only 0.08 % sequences on target and 8.9× coverage (with some areas not covered) for a genome skimming analysis for Merluccius productus. Margaryan et al. (2021) obtained an average value of the median depth of coverage for the 192 assemblies of Danish vertebrate species of 1,170.8×, ranging ca. 27– 12,208×, while the fraction of mtDNA reads was around 0.54% ranging from ca. 0.005% to 5.62% depending on the tissue preservation conditions using short-fragment data from genome skimming with HiSeq technology (Illumina). The overall coverage of the contig was dramatically enhanced in this study from hundreds to thousands of reads depending on the tissue, extraction method and treatment (Table 3), with the targeted mitosequencing method providing deepest coverage. The deep coverage results obtained for targeted mitosequencing are, not surprisingly, remarkably higher than other Cas9 targeted assays on nuclear regions of interest. Gilpatrick et al. (2020) obtained a coverage of 80x for a nuclear gene with similar length as the mitogenomes (18kbp), but 680x when targeting the region with multiple cuts. Since the number of available mitochondrial copies is up to 500 times the number of nuclear copies the results are comparable to the latter mentioned study. The cutting sites for the targeted mitosequencing approach are notably less covered than the rest of the mitogenome (File S3.3), but these sites are precisely the least variable regions of the mitogenome among fish. Directional bias is created at the cut site, due to the retention of bound Cas9 ribonucleotide complexes on cleaved DNA fragments that are distal to the PAM site.
While Sanger and mass parallel sequencing of short reads can hinder and obscure challenging tandem repeat regions found in the non-coding regions of the mtDNA of some organisms, such as the tandem repeat insertion in the control region of eulachon or the 44 bp stretch of polyG found on S. leucopsarus , in this study (Filipović et al., 2021; Kinkar et al., 2020), long read sequencing can overcome this obstacle. We expect a few scattered errors in the homopolymeric regions in the mitogenomes generated herein (sequencing error rates were ≥2% at the time of this study with Guppy5sup basecalling and R9.4 flow cells), but these are to be fixed as the base calling algorithms evolve or ultimately with the new sequencing kits and flowcells that increase yield and accuracy (SQK-LSK114 and flow cell R10.4) that were just released at the time of finalizing this manuscript, which were shown to reach >99% accuracy. These improvements will presumably allow single nucleotide variants and possible heteroplasmy detection (Keraite et al., 2022). On the other hand, the genomes generated in this study are likely to be more accurate than previous efforts since the methods overcomes the difficulty of sequencing the tandem repeats and long insertions, repetitions and rearrangements that can take place in vertebrate genomes (Formenti et al., 2021), provide read depth, favor long fragments of DNA and augment the proportion of the target. To circumvent sequencing errors, we recommend read depth of 50-75x for reliable mitogenome generation at the current error rate to get accurate mitogenomes with R9.4 flow cells. Srivathsan et al. (2021) demonstrate that 25-50x for R10.3 flowcells is sufficient and this can only decrease with new improvements.
The mitoenrichment method can be used straight away from mitochondrial isolation material of any fish and is particularly useful in the absence of a reference mitogenome of the species or closely related species of interest to develop guides for the Cas9 for the targeted mitosequencing method. However, the mitochondrial isolation can decrease the mitochondrial DNA amount and there can be nDNA co-purification to some extent. Exonuclease V digestion helps reduce the noise caused by the nDNA, but it is yet incomplete. The rapid sequencing kit (ONT) uses a transposase and a rapid adapter ligation that makes the library preparation brief and straightforward. We speculate that this method will also work on other phyla provided there is enough mitochondrial representation of the target vs microorganisms (i.e. little contamination from symbionts or gut microbiota) as seen in preliminary work performed on bivalves (results not shown, Ramón-Laca personal observation). On the contrary, the Cas9 guides RNAs developed for the targeted mitosequencing method will be useful for most fish and particularly the guides that cut the 12S and 16SrRNA genes. The two guides located on the tRNA-Gly may be less universal since this region is not as conserved and there may be mismatches. Both methods take the same time, around 2.5 h for the library preparation from DNA treatment after the DNA extraction to library load on the flow cell, but the mitoenrichment includes an extra step prior to the DNA extraction to isolate the mitochondria.
The targeted mitosequencing approach tends to shorten the life of the regular flow cells and particularly so in the case of the flongle compared to the mitoenrichment method, presumably due to the presence of the dephosphorylated DNA that may be blocking pores. On the other hand, the mitoenrichment method worked well on a flongle making it an ideal approach for de novo sequencing that will reduce the cost per samples if running them individually almost 10 times compared to the prince of a regular flow cell. However, a refrigerated centrifuge is still needed for the mitochondria isolation and it comes with the disadvantage of the flongle flow cells having a very short shelf-life, which reduces the chances of improvisation, especially in remote places. All the analyses of this study were carried out on a regular computer using standard analysis software within Geneious Primer 2021, with the exception of the Guppy5sup base calling that was performed using Google Colab with a subscription of 10 USD a month, which allowed the use of a high-end GPU that completed basecalling at a rate >106bp s-1.
At the time of these experiments a barcoding kit or protocol for multiple samples was not available and it was more practical to carry out each mitogenome sequencing in a different run to guarantee good results and reuse the flow cell up to five times (116 USD per mitogenome, sequencing cost without labor and reagents) or use the flongle version of the flow cell for the mitoenrichment workflow (67 USD per mitogenome without labor and reagents). Advancements that make multiplexed runs using barcodes possible with the same yield will reduce the cost significantly while increasing the throughput. (Keraite et al., 2022) run multiple samples by using different guide RNAs for each sample of human origin. However, this is less attainable with de novo sequencing of mitogenomes of fishes given their diversity but could be useful for multiple samples of the same species. Comparatively, the widespread genome skimming approach on a NovaSeq platform (Illumina) can run 200 samples with a cost of sequencing per sample of 140 USD that could be decreased to USD 28 if pooling 1000 samples and aiming for 100x on target read depth (Margaryan et al., 2021). However, this approach is only worth if pooling many different samples, which entails time and a great effort that is likely to involve the coordination of many different institutions and archived samples that may see the sample quality compromised. In addition, intense computational power would be necessary for genome skimming, especially if aiming for deep read approaches to gain coverage that can produce 5 Gb per specimen (Margaryan et al., 2021), which may be impractical for many researchers and laboratories, but also may take weeks to process while the approaches shown here can take two days from DNA extraction to mitogenome annotation. Moreover, typical sequencing lengths of short-fragment platforms from range 150-300 bp and thus this approach is expected to be more prone to include nuclear copies of the mitochondria (Numts) in the contig; resulting in illegitimate or inaccurate sequences. However, we have proven here that in the event of the DNA being degraded (e.g. eulachon Table 3), genome skimming is the only alternative in which case the researcher will have to assume the potential contamination with Numts that may be overcome by a high proportion of mtDNA that is likely to be due to degradation of linear DNA. This is probably a better approach for archived samples from museums.
Besides supplementing public databases with the standard metabarcoding mitochondrial genes (12S, 16S, COI, and to a lesser extent Cytb or ND2), there are other genes (e.g. D-loop) that may be relevant to closely related species, such as those with more recent evolutionary histories. In addition, the complete mitogenome availability will enhance multiple marker metabarcoding efforts (Leite et al., 2021), providing the resources for population mitogenomics (e.g. hypervariable non-coding region found in this study among three different individuals of Pacific hake) and will provide insightful knowledge about radiation and evolutionary processes with possible gene rearrangement. With deeper sequencing and research, it may even help gain perspective about mitochondrial disorders and diseases of wildlife.
The targeted mitosequencing approach can be further optimized by introducing some of the features used by (Keraite et al., 2022). They pre-enriched by selecting the circular DNA using exonuclease V like we did here for the mitoenrichment approach with a 27-49 % of reads increase (useful for high integrity DNA only) and they also performed a digestion with proteinase K after the cleavage of the DNA to avoid the directional bias seen in our mitosequencing workflow from bound Cas9 enzymes to the DNA that increase the yield of full length reads 2-fold. Further targeted enrichment could possibly be attained at the sequencing step without any further lab-bench effort by using the adaptive sampling capability of the nanopores (Martin et al., 2021; Payne et al., 2021). This process requires reference sequence of the target or related species. We speculate that if all fish mitogenomes are uploaded as references to enrich for, the on-target yield could be improved. However, this tool at the sequencing level requires greater computational power and would need to be validated to ensure it does not bias the results. Alternative methods to enrich the mitochondrial DNA yield could be also pursued using custom myBaits Mito Targeted Sequencing Kit (Arbor biosciences). However, this is a less universal, more complex and longer procedure (Zascavage et al., 2019). Instead, size selection could be achieved with gel extraction (e.g. BluePippin platform), but this instrument was not available for this study and it was not tested.
A great potential advantage of the MinION sequencing platform and associated single flow cell Oxford Nanopore sequencers is the portability outside of a traditional laboratory. The MinION has great portability and the instruments and equipment necessary (e.g. laptop, pipettes, minicentrifuge, small thermal cycler, vortex, plasticware) could also be easily transported in a couple of suitcases. In fact, the experiments in this study were mostly undertaken using a portable setup in a home office during the Covid19 pandemic showing that the lab requirements can be kept to a minimum. Though some of the reagents and the flow cells listed in the methods section would be more limiting as they need to be stored either in the fridge or freezer, these are often readily accessible on research vessels and at remote field stations, or portable refrigeration could be used. In the field, the targeted mitosequencing approach could be modified to be performed in situ, replacing the phenol-chloroform DNA purification with a simpler HMW DNA extraction (e.g. Monarch HMW kit, New England Biolabs), with very little equipment. The mitoenrichment approach uses the Rapid sequencing kit SQK-RAD004 (ONT). The field sequencing kit (SQK-LRK001) is fundamentally the same, but the chemistry is dehydrated to allow its use in the field in the absence of a freezer. This sequencing kit might simplify and allow in situ sequencing; however, further investigations are necessary to determine the yield as it was not tested in this study. The portability of this approach affords the opportunity not only to take these resources to the field, but in doing so, takes advantages of the freshness of samples to yield optimal results, and the production of data in a timely manner.
In summary, the targeted mitosequencing and mitoenrichment approaches, paired with the portable MinION sequencer, gives rapid, cost-effective results for the generation of whole mitogenomes, which are increasingly important for explorations of biological diversity in environmental DNA studies. Mitogenomes can be generated ad-hoc on one to a few samples at a time with little computational effort. Future improvements on samples multiplexing or in sequencing devices will enable high-throughput sequencing of mitogenomes at once. The methods outlined here rely heavily on high quality, (HMW DNA). We strongly recommend the targeted mitosequencing using the gRNAs designed in this study for all bony fishes, whereas the mitoenrichment would be preferred on distant taxa for which a group-specific nCATS has not been developed. Genome skimming is recommended when only degraded DNA is available (e.g. Museum specimens in ethanol). We believe these approaches will make the generation of reference mitogenomes accessible to many researchers worldwide.