Introduction
Fungi have traditionally been identified based on macro- and micromorphological features of fruiting body specimens or pure cultures. The introduction of molecular techniques established in the late 1980s represented a significant leap forward in fungal identification. Especially PCR amplification combined with Sanger sequencing of the nuclear 18S (SSU) and 28S (LSU) ribosomal rRNA genes and the nuclear ribosomal internal transcribed spacer (ITS) region from fungal tissue (e.g., lichen thalli, lesions in tissue, cultures from environmental samples and ectomycorrhizal root tips) quickly became popular and offered unprecedented taxonomic resolution. Common uses included species- and genus-level identification, analysis of cryptic species, and phylogenetic assessment of larger fungal clades as well as the kingdom Fungi at large (Gherbawy & Voitk, 2010). Later on, the identification of multiple fungi from more diverse substrates, including soil, plant roots, and water became possible by including a cloning step of amplicons prior to sequencing. However, these studies usually operated with tens to low hundreds of reads, rarely numbering in the thousands required to appropriately estimate fungal diversity in soils (Taylor et al., 2014). Accordingly, sequences and operational taxonomic units (OTUs) were usually handled manually or using specific programs, with no need for bioinformatics tools.
The development of high-throughput sequencing (HTS) methods such as 454 pyrosequencing (454 Inc., obsolete), Illumina sequencing (Illumina Inc., www.illumina.com) and Ion Torrent (Thermo Fisher Scientific Inc., www.thermofisher.com) transformed fungal identification capacity in the 2000s (Jumpponen & Jones, 2009). These so-called next- or second-generation HTS methods increased the number of reads by 2-6 orders of magnitude and the number of simultaneously processable samples by 1-2 orders of magnitude. These metabarcoding methods (cf. Taberlet et al., 2012) enabled estimating fungal diversity exhaustively from environmental DNA (eDNA) on an individual sample scale as well as facilitating global scale comparisons (Tedersoo et al., 2014; Sun et al., 2021). Yet, these second-generation platforms as well as the more recent DNA nanoball sequencing (DNBseq; MGI-Tech Inc., www.mgitech.com) were only able to address short (<550 bases) fragments of the genetic markers, resulting in the loss of taxonomic resolution and phylogenetic information as well as difficulties in identifying technical artefacts compared with longer Sanger reads.
In the 2010s, long-read, third-generation HTS platforms such as PacBio single-molecule real-time (SMRT) sequencing (Pacific BioSciences Inc., www.pacbio.com) and nanopore sequencing (Oxford Nanopore Technologies Inc., https://nanoporetech.com) were introduced (van Dijk et al., 2019). Due to low sequencing depth (tens of thousands of reads in total, resulting in only hundreds rather than thousands of reads per sample) and high raw error rates (12-20%), these methods could not initially compete with short-read HTS platforms. However, both technologies made a great leap forward in 2020 when PacBio Sequel II instruments became broadly available and new solutions were developed to greatly reduce error rates in Nanopore sequencing (Karst et al., 2021; Tedersoo et al., 2021a). These long-read technologies and synthetic long reads provide high-quality sequence data for up to 5 kb amplicons, which enables bridging variable and conserved fragments of one or more genes in a single sequencing round as well as resolving alleles and haplotypes (Callahan et al., 2021; Tedersoo et al., 2021a).
Along with the rapid development of HTS methods, bioinformatics platforms and analytical resources have evolved to match the computational needs imposed by large datasets. Metabarcoding approaches have been extensively reviewed in several recent studies with a focus on their conceptual foundation (Taberlet et al., 2018), pathogenic organisms (Piombo et al., 2021; Tedersoo et al., 2019), applications in mycology (Nilsson et al., 2018), eukaryotes more broadly (Ruppert et al., 2019 ) as well as overall experimental planning (Zinger et al., 2019a), trade-offs among technology generations (Kennedy et al., 2018; Loit et al., 2019), and analytical pitfalls (Halwachs et al., 2017; Critescu & Hebert, 2018). Here we provide a review of available methods and propose best practices for designing and performing studies using metabarcoding in fungi. We also compare the performance of several popular methods developed for bacteria to assess their suitability for fungi. The vast majority of our recommendations are relevant to prokaryotes, protists and metazoans alike.